[#23] Apple's HyperCloning: Training Using Small Models

Plus Amazon innovates on product translations, Anthropic releases Contextual Retrieval, Detecting deepfakes with smartphones, and RLHF can mislead humans

New: I’m writing a new newsletter dedicated to AI in health and medicine. There is so much new research, it deserves a dedicated newsletter. Subscribe here.

Hello readers, in this issue we cover

  • Apple accelerates training by scaling up the number of parameters using small models

  • Amazon finds a better way to translate product titles in their catalog to multiple languages

  • Anthropic improves RAG with Contextual Retrieval

  • Detecting deepfakes in realtime using vibrations

  • LLMs can learn to mislead humans during RLHF

🏎️ Apple accelerates Pre-training by Scaling Up Parameters on a Small Model

Hypercloning, created by Apple, uses small models and scales up their parameters making training more efficient

When training LLMs, starting from scratch with random parameters is slow and expensive. Smaller models are cheaper to train but aren't as accurate as larger ones. This paper explores whether we can use smaller, pre-trained models to kick-start the training of larger models, saving time and costs. They introduce a method called HyperCloning, which scales up the parameters of a smaller model to match those of a larger model while keeping the smaller model's capabilities intact. This way, the larger model starts training with the knowledge of the smaller model, resulting in faster training and reduced computational costs.

🛍️ Translating Product Titles in E-commerce Into Multiple Languages with RAG

Amazon researchers can translate product titles to different languages with RAG

Translating product titles for e-commerce sites is tricky because titles are short, often lack context, and contain specialized terms. While large multilingual language models (LLMs) can translate, they sometimes struggle with these challenges. This study proposes a solution using a technique called Retrieval-Augmented Generation (RAG). It works by pulling similar bilingual product examples from existing data and using them to guide the translation process. This approach improves translation quality, especially for languages where the LLM isn’t very strong, showing up to a 15.3% improvement in accuracy scores.

🔍 Anthropic releases Context Retrieval to improve RAG performance

Anthropic recently released Contextual Retrieval, which is a preprocessing technique that improves retrieval accuracy in RAG applications. Contextual Retrieval uses two sub-techniques: Contextual Embeddings and Contextual BM25. This method can reduce the number of failed retrievals by 49% and, when combined with reranking, by 67%. These represent significant improvements in retrieval accuracy, which directly translates to better performance in downstream tasks.

📱 Deepfakes are appearing on Zoom or Teams. Detect Them in Realtime with this Novel Approach

In Issue #17, we highlighted research that was able to detect deepfake videos. Here, researchers in Singapore introduces a new approach called SFake, a real-time deepfake detection method that takes a unique approach by sending small probes to create vibrations on a smartphone, which causes slight, controlled movements in the video footage. It then checks if the facial area matches these movement patterns to determine if the face has been swapped. Tests show that SFake works better than other methods, offering higher accuracy, faster detection, and using less memory.

👨🏼‍💻 LLMs can learn to mislead humans via RLHF

Language models are getting better at convincing humans they're correct, even when they're wrong. This is an unintended consequence of a popular training method called RLHF (Reinforcement Learning from Human Feedback).

This study found:

  1. After RLHF, models became more persuasive but not more accurate.

  2. Human evaluators' error rates increased significantly:

    • 24.1% more false positives on a question-answering task

    • 18.3% more false positives on a programming task

  3. Existing methods for detecting intentional deception in AI don't work for this unintended problem.

This is a critical flaw in current RHLF strategies. New methods are needed to help humans accurately assess AI outputs, especially for complex tasks.

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

BM25

BM25 is a ranking algorithm used in search engines to score and rank documents based on their relevance to a query. It combines term frequency, inverse document frequency, and document length normalization to determine how well a document matches the search terms. The model uses tunable parameters to balance the impact of these factors, making it flexible and effective. BM25 is widely used due to its simplicity, robustness, and strong performance in information retrieval tasks.

Feed Forward Neural Network

A Feedforward Neural Network (FNN) is a type of artificial neural network where information moves in one direction—from input nodes, through hidden layers, to output nodes—without looping back. It consists of interconnected layers of neurons, each layer transforming the data before passing it to the next. FNNs are used for tasks like classification and regression because they can model complex relationships between inputs and outputs. They are foundational in machine learning and serve as the basis for more complex network architectures.