General Intelligence
Posts
[#29] AI can identify food from around the world

[#29] AI can identify food from around the world

Plus Amazon improves fashion recommendations, detecting brain tumors from MRIs, robots imitating humans, Zillow builds a real-estate chatbot

Chris Han
October 17, 2024

In partnership with

Hello readers, in this issue we cover

Detecting brain tumors with convolutional neural nets
Robots can imitate humans after watching a single video
A massive dataset of food from around the world, and AI that can recognize it
Amazon improves fashion recommendations through pre-training and fine-tuning
Zillow builds a real-estate chatbot that complies with housing regulations

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

🍗 AI can identify food from around the world

Dishes from around the world can be identified by AI with this large dataset

WORLDCUISINES is a large benchmark to test Vision Language Models (VLMs) on culture-specific knowledge, particularly in non-English languages and underrepresented cultures. WORLDCUISINES includes a visual question answering (VQA) dataset with over 1 million text-image pairs across 30 languages and 9 language families, focusing on tasks like identifying dish names and their origins. The benchmark includes training (1 million instances) and evaluation datasets (12k and 60k instances). Results show that VLMs perform better with location context but struggle in adversarial situations and predicting specific regional cuisines. A knowledge base with annotated food data and images is also released to aid further research.

Read the research paper

🧠 Detecting Brain Tumors in MRIs with CNNs

This study explores various statistical and machine learning models to detect and classify brain tumors using MRI images, focusing on early-stage diagnosis to improve treatment outcomes. Models tested include statistical approaches (like linear and logistic regressions) and machine learning methods (like decision trees, random forests, perceptrons, CNNs, and RNNs). The results show that convolutional neural networks (CNNs) perform the best. The CNN model is also effective for multi-class classification, distinguishing between normal, glioma, meningioma, and pituitary tumor MRI images. The study highlights the potential of machine learning in assisting radiologists with early, accurate diagnosis of brain tumors.

Read the research paper

🤖 Teaching robots to imitations humans from video

A robot learning through imitation

This paper presents OKAMI, a method for teaching humanoid robots manipulation skills by imitating actions from a single video. OKAMI generates a manipulation plan from an RGB-D video and allows the robot to adjust its actions based on object locations during execution. It uses vision models to identify relevant objects and retarget human motions and hand poses for the robot. Experiments show OKAMI performs better than existing methods, adapting well to different visual and spatial conditions. It also trains closed-loop visuomotor policies with a 79.2% success rate, without requiring extensive human intervention.

Read the research paper

👗 Amazon improve fashion recommendations with by fine-tuning LLMS with fastion data

Fastion recommendations based on pre-training LLMs

This paper addresses the challenges of recommendation systems in the fashion industry, a key area of global e-commerce. Most recommendation systems are built for general e-commerce and struggle with fashion-specific issues. To improve this, the authors propose a sequential fashion recommendation framework using a pre-trained LLMs with recommendation-focused prompts. The framework is fine-tuned using fashion data and introduces a new mix-up-based retrieval method to translate text into product suggestions. Experiments show that this approach significantly improves fashion recommendation performance.

Read the research paper

🏡 Zillow builds a real estate chatbot designed to comply with housing laws and regulations

A sample prompt that can lead the LLM to respond with non-compliant answers

Zillow, a US-based real-estate platform, created a chatbot specialized in real estate, designed to align with human preferences and avoid discriminatory practices like steering and redlining, which have historically affected the U.S. real estate industry. The authors generate a synthetic dataset for general instruction-following and safety. They fine-tuned a llama-3-8B-instruct model and significantly improved its performance to rival larger models like GPT-4, while also making it safer and more compliant. The model, data, and code are open-sourced to support further development.

Read the research paper

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

Convolutional Neural Network (CNN)

A Convolutional Neural Network (CNN) is a type of computer model that helps recognize patterns in images, like identifying objects or features. It works by scanning the image and learning to detect important details, making it useful for tasks like facial recognition or medical imaging.

Recurrent Neural Network (RNN)

A Recurrent Neural Network (RNN) is a neural network designed for sequential data that maintains memory of previous inputs. Unlike standard neural networks that treat each input independently, RNNs have feedback loops allowing them to use past information to influence current outputs. They're commonly used for tasks like text generation, translation, and speech recognition where context matters.