- General Intelligence
- Posts
- [#29] AI can identify food from around the world
[#29] AI can identify food from around the world
Plus Amazon improves fashion recommendations, detecting brain tumors from MRIs, robots imitating humans, Zillow builds a real-estate chatbot
Hello readers, in this issue we cover
Detecting brain tumors with convolutional neural nets
Robots can imitate humans after watching a single video
A massive dataset of food from around the world, and AI that can recognize it
Amazon improves fashion recommendations through pre-training and fine-tuning
Zillow builds a real-estate chatbot that complies with housing regulations
Learn AI in 5 Minutes a Day
AI Tool Report is one of the fastest-growing and most respected newsletters in the world, with over 550,000 readers from companies like OpenAI, Nvidia, Meta, Microsoft, and more.
Our research team spends hundreds of hours a week summarizing the latest news, and finding you the best opportunities to save time and earn more using AI.
🍗 AI can identify food from around the world
Dishes from around the world can be identified by AI with this large dataset
WORLDCUISINES is a large benchmark to test Vision Language Models (VLMs) on culture-specific knowledge, particularly in non-English languages and underrepresented cultures. WORLDCUISINES includes a visual question answering (VQA) dataset with over 1 million text-image pairs across 30 languages and 9 language families, focusing on tasks like identifying dish names and their origins. The benchmark includes training (1 million instances) and evaluation datasets (12k and 60k instances). Results show that VLMs perform better with location context but struggle in adversarial situations and predicting specific regional cuisines. A knowledge base with annotated food data and images is also released to aid further research.
🧠 Detecting Brain Tumors in MRIs with CNNs
This study explores various statistical and machine learning models to detect and classify brain tumors using MRI images, focusing on early-stage diagnosis to improve treatment outcomes. Models tested include statistical approaches (like linear and logistic regressions) and machine learning methods (like decision trees, random forests, perceptrons, CNNs, and RNNs). The results show that convolutional neural networks (CNNs) perform the best. The CNN model is also effective for multi-class classification, distinguishing between normal, glioma, meningioma, and pituitary tumor MRI images. The study highlights the potential of machine learning in assisting radiologists with early, accurate diagnosis of brain tumors.
🤖 Teaching robots to imitations humans from video
A robot learning through imitation
This paper presents OKAMI, a method for teaching humanoid robots manipulation skills by imitating actions from a single video. OKAMI generates a manipulation plan from an RGB-D video and allows the robot to adjust its actions based on object locations during execution. It uses vision models to identify relevant objects and retarget human motions and hand poses for the robot. Experiments show OKAMI performs better than existing methods, adapting well to different visual and spatial conditions. It also trains closed-loop visuomotor policies with a 79.2% success rate, without requiring extensive human intervention.
👗 Amazon improve fashion recommendations with by fine-tuning LLMS with fastion data
Fastion recommendations based on pre-training LLMs
This paper addresses the challenges of recommendation systems in the fashion industry, a key area of global e-commerce. Most recommendation systems are built for general e-commerce and struggle with fashion-specific issues. To improve this, the authors propose a sequential fashion recommendation framework using a pre-trained LLMs with recommendation-focused prompts. The framework is fine-tuned using fashion data and introduces a new mix-up-based retrieval method to translate text into product suggestions. Experiments show that this approach significantly improves fashion recommendation performance.
🏡 Zillow builds a real estate chatbot designed to comply with housing laws and regulations
A sample prompt that can lead the LLM to respond with non-compliant answers
Zillow, a US-based real-estate platform, created a chatbot specialized in real estate, designed to align with human preferences and avoid discriminatory practices like steering and redlining, which have historically affected the U.S. real estate industry. The authors generate a synthetic dataset for general instruction-following and safety. They fine-tuned a llama-3-8B-instruct model and significantly improved its performance to rival larger models like GPT-4, while also making it safer and more compliant. The model, data, and code are open-sourced to support further development.
🤯 Today I Learned
Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:
Convolutional Neural Network (CNN)
A Convolutional Neural Network (CNN) is a type of computer model that helps recognize patterns in images, like identifying objects or features. It works by scanning the image and learning to detect important details, making it useful for tasks like facial recognition or medical imaging.
Recurrent Neural Network (RNN)
A Recurrent Neural Network (RNN) is a neural network designed for sequential data that maintains memory of previous inputs. Unlike standard neural networks that treat each input independently, RNNs have feedback loops allowing them to use past information to influence current outputs. They're commonly used for tasks like text generation, translation, and speech recognition where context matters.