[#21] Nvidia releases new SOTA reranking model

Plus Microsoft releases a model to simulate financial markets, Meta researchers release Source2Synth, an LLM that can predict stock prices, and Google finds a better way to translate long-text with LLMs

Hello readers, in this issue we cover

  • Nvidia introduces new state-of-the-art reranker for RAG systems

  • Meta researchers release Source2Synth, a better way to generate synthetic data for fine-tuning

  • Microsoft creates Large Market Models (LMM) that can simulate financial markets

  • StockTime is an LLM that can predict stock prices

  • Google finds a better way to translate long-form text

🔎 NVIDIA introduces state-of-the-art reranker, NV-RerankQAMistral-4B-v3

Re-ranking is a crucial step in RAG systems to ensure accurate responses

Ranking models are key to improving how well in search systems (e.g. Retrieval Augmented Generation systems) find the most relevant information. These systems typically start by using either dense or sparse models to gather a list of potential results. Ranking models then sort these results by how closely they match the user's query.

Nvidia introduces the NV-RerankQAMistral-4B-v3 reranking model based off Mistral7B which improves accuracy by 14% compared to others, and compares several ranking models and looks at their effect on search accuracy, focusing on question-answering tasks.

The study also explores how different model sizes and settings affect performance and highlights the tradeoffs in production deployments.

🧑🏻‍🏫 Source2Synth: teaching LLMs new skills without relying on costly human annotations

Source2Synth approach in assisting with multi-hop reasoning questions

This paper introduces Source2Synth, a general approach for generating synthetic data based on external real-world sources to create more realistic fine-tuning datasets. LLMs continue to face challenges with complex tasks such as multi-step reasoning, tool usage, and handling structured data. There is often limited data available on how to effectively use source information to solve such tasks. While human-annotated datasets could improve performance during fine-tuning, the process of collecting this data is costly and time-consuming. As a result, synthetic data generation offers a more practical solution to address this issue.

📈 Microsoft creates Large Market Models (LMM), a new generative model for financial market simulation engine

Large Market Models can be used for financial market simulation

Researchers at Microsoft introduce the Large Market Model (LMM), an order-level generative model for simulating financial markets. Their Market Simulation engine (MarS), powered by LMM, creates realistic, interactive market environments where users can test strategies without financial risk. The main goals of the research include evaluating LMM’s scalability in financial markets, testing MarS’s realism, and exploring its applications, like forecasting, anomaly detection, and agent training.

📈 StockTime: an LLM that can predict stock prices

StockTime architecture

StockTime is a highly efficient model for predicting stock prices using LLMs. It also identifies relationships between different stocks, trends, and timestamps from stock data, converting these into text to help the model understand stock patterns better. Unlike other methods that rely on large amounts of text data for training, StockTime works directly with stock price data. They show that StockTime outperforms existing financial models by 5%, opening up new possibilities for using LLMs to predict stock prices on an intraday and hourly basis.

🀄️ Google tackles translating texts step-by-step with LLMs

Translating step-by-step framework with Gemini 1.5

Google reseachers present a step-by-step method for translating long texts using LLMs. Inspired by translation studies, they break down the process into several stages: researching before translation, drafting, refining, and proofreading, all done through multiple interactions with the Gemini 1.5 Pro model. Their tests on ten languages show that this approach produces better translations compared to translating an entire document in one go.

When compared to other translation methods, including similar human-like LLM approaches and systems that translate documents piece by piece, the step-by-step method performs better. This suggests that LLMs can improve their translations progressively, moving away from the traditional one-step translation model.

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

Autoregressive Encoder

An autoregressive encoder is a type of model used in machine learning that processes input data sequentially, predicting or encoding each part of the sequence based on the previous parts. In other words, it generates the next element in a sequence by looking at the elements that came before it.

This approach is often used in tasks involving sequential data, like language modeling, where the goal is to predict the next word in a sentence based on the preceding words. The autoregressive nature means that the model doesn’t process everything at once but builds the output step by step, depending on earlier steps.

FinancialLLM / FinLLM

A FinLLM (Financial Large Language Model) is a type of Large Language Model specifically designed or fine-tuned for financial tasks. These models leverage the same underlying architecture as general-purpose LLMs like GPT, but they are adapted to handle domain-specific data, such as financial reports, market news, stock prices, and regulatory filings.

FinLLMs are trained on large datasets of financial texts and numerical data to improve their ability to perform tasks like stock price prediction, sentiment analysis, financial forecasting, and natural language processing (NLP) tasks within the finance domain. They can process and analyze structured and unstructured data, providing insights into market trends, extracting key financial information, and generating forecasts based on historical and real-time financial data.

Reranker

A reranker is a model or algorithm used in information retrieval systems to refine and reorder a set of candidate results based on their relevance to a given query. In a typical multi-stage retrieval pipeline, an initial retrieval model (often based on simple techniques like BM25 or dense retrieval) generates a broad list of potentially relevant documents or passages. This list is often noisy, containing both highly relevant and less relevant results.

The reranker steps in after this initial retrieval, taking the candidates and applying a more sophisticated model—such as a neural network-based ranking model—to assign a better relevance score and reorder them. The reranker typically uses richer features, such as query-document similarity computed using deep embeddings or fine-tuned transformer models, to better capture subtle nuances in the data.

By focusing on a smaller subset of candidates, rerankers can afford to use more complex and computationally expensive methods, thus improving the precision and quality of the final ranked list without the overhead of processing the entire document collection. This is especially useful in applications like search engines, question answering systems, and recommendation engines.