General Intelligence
Posts
Issue #10: The larger the LLM, the more susceptible to data poisoning attacks

Issue #10: The larger the LLM, the more susceptible to data poisoning attacks

Plus IBM researchers boost neural network performance, Intel releases a new RAG framework, generating 3D images, and how LLMs can visually plan new activities

Chris Han
August 08, 2024

Hello readers! In today’s issue, we cover several new research papers:

How large LLMs are more susceptible to data poisoning attacks
IBM researchers boost neural network performance through new architectures
Intel releases a new framework for building RAG apps
Generating 3D images from 64×64 pixel images
How LLMs can automatically plan daily activities

☠️ Scaling Laws for Data Poisoning LLMs

Models become more susceptible to data poisoning as they scale up

This paper investigates how data poisoning—where harmful data is introduced during training—affects LLMs as they increase in size. The study explores three threat models: malicious fine-tuning, imperfect data curation, and intentional data contamination, testing these on LLMs ranging from 1.5 to 72 billion parameters. The findings show that larger models are increasingly susceptible to learning harmful behaviors more quickly than smaller models, highlighting the urgent need for robust safeguards as models scale up.

Read the paper

🧠 IBM Researchers Boost Performance of Neural Networks

IBM researchers propose a framework that can generalize neural networks through operations on activation functions, thereby boosting its performance.

Activation functions are non-linear functions used in neural networks that allow them to learn complex mappings between inputs and outputs.
Common activation functions include ReLU, Tanh, and Sigmoid, with the choice depending on the application domain.
This work proposes a unifying framework that theoretically explains the performance benefits of different activation function approaches.
The framework also allows for the development of "extensions" - special generalizations of neural networks - through operations on activation functions.
The paper shows that these "extended" neural networks have performance benefits over standard neural networks, with minimal increases in space and time complexity.
The performance benefits of the "extended" neural networks are demonstrated on both standard test functions as well as real-world time-series datasets.

Read the paper

👩‍💻 Intel releases framework for building RAG apps

RAG Foundry framework

Intel releases RAG FOUNDRY, an open-source framework for augmenting LLMs for RAG use cases. RAG FOUNDRY integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources.

Read the research paper

📐 Generating 3D models from 64×64 pixels

This paper presents a novel method for creating 3D models using 64x64 pixel images, called "Object Images." These images encapsulate a 3D object's geometry, appearance, and structure, making the complex process of 3D shape generation more manageable. The approach uses image generation models to create realistic 3D shapes, evaluated using the ABO dataset. The results show comparable performance to existing 3D generative models and support Physically Based Rendering (PBR) material generation.

🚶‍♀️‍➡️ Using LLMs to automatically plan daily activities

The research investigates the capability of multimodal reasoning models powered by LLMs to facilitate vision-powered assistants for multi-step daily activities. Such assistants need to:

Encode relevant visual history from sensors (e.g. camera)
Forecast future actions to accomplish the activity
Replan based on the user in the loop

The researchers benchmarked two prominent classes of multimodal LLM approaches - Socratic Models and Vision Conditioned Language Models (VCLMs) - on video-based action anticipation tasks using offline datasets. This tested the first two capabilities of grounding visual history and forecasting.

However, the offline benchmarks did not allow the researchers to close the loop with the user, which is essential to evaluate the replanning capabilities and measure successful activity completion in assistive scenarios.

To address this, the researchers conducted a user study with 18 participants performing 3 different multi-step cooking activities while wearing an egocentric observation device and following assistance from multimodal LLMs.

The key findings were:

The Socratic approach outperformed VCLMs in both offline and online settings
Grounding long visual history, common in activity assistance, remains challenging, especially for VCLMs
Offline metrics do not necessarily indicate online performance

Read the paper

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

Activation Function

An activation function is a mathematical function that determines the output of a neuron based on the input it receives. The activation function introduces non-linearity into the neural network, allowing it to learn and model complex relationships in the data. Without an activation function, the neural network would only be able to learn linear relationships, which would severely limit its capabilities. The most common activation functions used in neural networks include the sigmoid, tanh, and ReLU (Rectified Linear Unit) functions, each with their own unique characteristics and applications.

UV Mapping

UV mapping is a technique in 3D modeling where a 2D image (texture) is wrapped around a 3D object. "U" and "V" are the axes of the 2D texture, distinguishing them from the 3D spatial coordinates (X, Y, Z). This method helps apply detailed textures, like skin on a character or wood grain on furniture, to 3D models, making them look more realistic. Essentially, it involves creating a flat representation of a 3D surface to accurately map textures onto the model.

Wikipedia

Image Diffusion

Image diffusion is a process used in computer vision and graphics to create or modify images by iteratively applying transformations. It often involves algorithms that add, blend, or propagate visual information across an image to enhance details, remove noise, or generate new visuals. In the context of AI and machine learning, image diffusion techniques can help in tasks like super-resolution, inpainting, and generating realistic textures by simulating how light and color spread over surfaces.

User-in-the-loop

User-in-the-loop is a design approach where human input is integrated into automated systems or processes. Key points:

Combines human judgment with AI/automated processes
Used in machine learning, content moderation, design tools, etc.
Improves accuracy and handles complex scenarios better
Challenges include interface design and balancing automation with human input
Raises ethical considerations about human agency and bias

This approach aims to leverage both machine efficiency and human expertise for more effective and adaptable solutions.

Data Poisoning

Data poisoning is an attack on machine learning models where malicious data is injected into the training dataset. This corrupted data aims to deceive the model, causing it to learn incorrect patterns or make faulty predictions. The purpose of such attacks can vary, including degrading model performance, causing it to behave erratically, or manipulating it to produce specific incorrect outcomes. Data poisoning can be particularly harmful as it undermines the integrity and reliability of AI systems.