General Intelligence
Posts
[#28] DeepMind finds better ways to compress data using small language models

[#28] DeepMind finds better ways to compress data using small language models

Plus RAG for tables, suicide risk detection on social media, how chain-of-thought-prompting improves LLM reasoning

Chris Han
October 10, 2024

Hello readers, in this issue we cover

DeepMind finds better ways to compress data using small language models
TableRAG: a improved RAG framework for tabular data
How Step-by-Step Reasoning Affects AI's Ability to Solve Puzzles
LLMs for Suicide Detection on Social Media
Understanding and Detecting Errors in LLMs

🧠 DeepMind finds better ways to compress data using small language models

Recent studies show that foundation models can compress data, but their large size makes them less efficient compared to standard algorithms. Reducing model size doesn’t always help because it can lead to worse results. Researchers trained models on 165GB of data across text, images, and audio to find a balance between size and performance. They discovered that small models (with millions of parameters) can outperform common compression tools like gzip, JPEG 2000, and FLAC, especially for audio data. However, the models are less effective when applied to new types of data they haven’t seen before.

Read the research paper

🗓️ TableRAG: RAG framework for tables

TableRAG performs better than existing RAG methods for tabular data

Advancements in language models (LMs) have improved their ability to work with tables, but they often struggle with large tables due to limitations on how much data they can process at once. To address this, a new system called TableRAG was developed. It uses smarter methods to find and retrieve only the important parts of a table before giving them to the LM, which makes the process faster and more efficient. TableRAG has been tested on large datasets and has shown top performance in understanding big tables.

Read the research paper

🧩 How Step-by-Step Reasoning Affects AI's Ability to Solve Puzzles

This study explores how Chain-of-Thought (CoT) prompting—where AI models explain their reasoning step-by-step—affects the performance LLMs in solving reasoning tasks, like decoding shift ciphers (where letters are shifted in the alphabet). The researchers tested three models (GPT-4, Claude 3, and Llama 3.1) and found that CoT performance depends on three factors: how likely the expected answer is (probability), what the model learned during training (memorization), and how many steps are involved in reasoning (noisy reasoning). These factors significantly impact accuracy, with GPT-4's accuracy, for example, ranging from 26% to 70% depending on the situation. They conclude that CoT prompting relies on both memorization and a form of probabilistic reasoning.

Read the research paper

📲 LLMs for Suicide Detection on Social Media

Prompt: girl in her bedroom on her phone browsing social media

This study looks at how to use AI models to detect suicidal thoughts in social media posts. Since it's impossible to manually review the huge number of posts each day, the researchers propose using large language models (LLMs) for automatic detection. They created a method where these models can label posts without needing prior human labels and then improve the accuracy using traditional techniques. By combining several advanced AI models, they improved the detection accuracy by 5%. Their approach shows promise in accurately identifying posts related to suicide. Larger AI models tend to perform better at this task.

Read the research paper

🕵️ Understanding and Detecting Errors in LLMs

Authors of the paper

LLMs often make mistakes, such as providing incorrect facts, showing bias, or making reasoning errors—collectively called "hallucinations." Recent research shows that LLMs store information internally about the accuracy of their outputs, which can help identify these mistakes. This study found that certain parts of the model's data hold more truthfulness information than previously thought, improving error detection. However, the ability to detect errors doesn't work universally across all data types. The researchers also discovered that LLMs can predict the types of mistakes they might make, helping to create better error prevention methods. Interestingly, LLMs can sometimes store the correct answer internally but still generate a wrong one. These findings provide valuable insights into how LLMs make mistakes and how to improve their accuracy.

Read the research paper

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

Chain-of-thought (CoT)

Chain of thought prompting is a method used to help large language models think through problems step by step, similar to how people do when solving something complex. Instead of just giving an answer right away, the model is encouraged to explain its reasoning process, breaking it down into smaller steps. This approach improves the model’s ability to handle tasks that involve reasoning, like solving math problems or making decisions. It helps the model produce more accurate and logical answers by focusing on how it got to the solution, not just the final answer.