Issue #12: AI Trading Agents Produce Superior Returns

How DeepMind built the Imagen 3 TTS model, predicting stock performance based on earnings reports, recovering prompts used to generate an image, better one-shot imitation learning

In this issue, we cover

  • AI trading agents that outperform the market

  • How DeepMind built its Imagen 3 TTS model

  • Discovering the prompts that generated an image

  • LLMs can make predictions based on the earnings reports

  • A new technique for One Shot Imitation Learning

📈 AI Agents That Can Trade Can Outperform the Market by 15-30%

This paper provides a comprehensive review of current research on using LLMs for trading. It backtested various architectures such as LLM as a Trader, LLM as an Alpha Miner, and more and compared their performance. Overall, LLM powered trading agents have demonstrated strong performance, achieving annualized return ranging from 15% to 30% over the strongest baseline during backtesting period with real market data, which demonstrates the great potential of using LLM in financial trading.

🤖 How DeepMind Built the Imagen 3 TTS Model

Generated by Imagen 3

In May, DeepMind released their Imagen 3 model, a latent diffusion model that generates high quality images from text prompts. Now, DeepMind released the paper behind Imagen 3 that details their approach.

  • Trained on a mix of text, images, and associated annotations

  • Hardware: Used TPUv4 and TPUv5 for training

  • Software: Used JAX library for training

  • Multistage preprocessing of data to remove unsafe images, AI-generated images, deduplicated images, and synthetically generated captions

  • Humans evaluated 5 different quality aspects of generated images including overall preference, prompt-image alignment, visual appeal, detailed prompt-image alignment, and numerical reasoning

  • Automatic evaluation metrics were used for image quality and prompt-image alignment

⌨️ Finding the Prompts That Generated an Image

Recovering the prompt use to generate an image

Want to know how to reverse engineer an AI image to find the prompt that generated it? There are several algorithms that can invert the image generation process to recover the prompt. These algorithms include PEZ, GCG, AutoDAN, and BLIP2’s image captioner.

This paper compares the natural language prompt recovered by each method using several metrics and they are followed by a discussion on the proficiency of each method.

🦙 Llama3 Can Make Stock Predictions Based on Earnings Reports

A team of researchers employed LLMs instruction-tuned with a novel combination of instruction-based techniques and QLoRA compression. It found that Llama3 performs best compared to other models when making stock predictions on earnings reports.

The team integrated “base factors” such as financial metric growth and earnings transcripts, with “external factors”, including market indices performance and analyst grades to create a rich, supervised dataset. The dataset was then fed into various instruction-tuned LLMs and found superior predictive performance when compared to benchmarks.

🔫 A new technique for One Shot Imitation Learning using less labels

One-shot Imitation Learning (OSIL) is a technique that allows AI to learn a new task by watching just one example. Normally, teaching AI in this way requires a huge number of examples, each showing different ways to do the same task, which is not practical. To make this more feasible, the researchers propose a new method where the AI learns from a large collection of examples without specific labels, combined with a small set of labeled examples. They developed an algorithm that helps the AI group similar tasks together, allowing it to create its own labeled examples from the large, unlabeled dataset. Their tests showed that this approach can be just as effective as using fully labeled data, making it a big step forward in teaching AI with fewer labels.

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

Contrastive Language-Image Pretraining (CLIP)

CLIP (Contrastive Language-Image Pretraining) is a model developed by OpenAI that can understand and relate images and text in a highly sophisticated way. CLIP is designed to learn visual concepts from natural language descriptions, making it possible to perform tasks like image classification, object recognition, and even generating textual descriptions of images without being explicitly trained on specific datasets for those tasks.

Prompts Made Easy (PEZ)

PEZ (Prompts Made Easy) is a method designed to optimize "hard prompts," which are specific, interpretable prompts used to guide models like CLIP (Contrastive Language-Image Pretraining). Traditional hard prompts are manually crafted and can be challenging to create effectively. PEZ automates this process by discovering optimal prompts through a gradient-based optimization method.

The PEZ approach is beneficial for both language and vision-language models, allowing users to generate and fine-tune prompts that effectively direct the model's outputs. This technique improves the efficiency and effectiveness of prompt engineering, making it easier for users to leverage AI models for various tasks without requiring deep expertise in crafting prompts.

Quantized Low-Rank Adaptation (QLoRA)

QLoRA (Quantized Low-Rank Adaptation) is a technique for efficiently fine-tuning large language models. It combines quantization, which reduces the precision of model weights to save memory, with Low-Rank Adaptation (LoRA), which fine-tunes the model using fewer parameters. This approach makes it possible to fine-tune large models on smaller hardware, like consumer GPUs, without significant performance loss, making large-scale models more accessible for specific tasks.

Instruction Fine-Tuning

Instruction fine-tuning is a technique used to train large language models to better understand and follow specific instructions given by users. It involves fine-tuning a model on a dataset of examples where the input includes a clear instruction and the output is the model's expected response. This process helps the model learn how to interpret and respond accurately to various prompts or commands, improving its ability to perform specific tasks, answer questions, or generate content based on user instructions. The result is a model that is more aligned with user intentions and can handle a wider range of tasks more effectively.

One-shot Imitation Learning (OSIL)

One-Shot Imitation Learning is a machine learning approach where an AI agent learns to perform a new task by observing a single demonstration of that task. Unlike traditional learning methods that require many examples to learn a task effectively, one-shot imitation learning allows the model to generalize from just one example.