[#19] ScaleAI improves how LLMs generate code

Plus predicting if a startup will be successful, how TikTok content can be moderated, low-light image enhancement, xAI, and a new 7B parameter model from AI2

Hello readers, in this issue we cover:

  • ScaleAI finds a better way to generate code with LLMs

  • Predicting a startup’s odds of success using Crunchbase data

  • How offensive TikTok content can be detected with ML models

  • Extreme low-light image enhancement with a new type of neural network

  • What is Explainable AI?

  • Allen Institute for Artificial Intelligence releases the most open model with 7 billion parameters

Note: Some summaries, not all, were generated with ChatGPT or Claude.

📰 ScaleAI finds a better way to generate code

PLANSEARCH makes observations about a problem to improve code generation

Researchers at Scale AI created a new algorithm called PLANSEARCH to improve the code generation capabilities of LLMs. The algorithm enhances the diversity of outputs by searching over natural language plans (high-level descriptions of solutions) rather than directly over code solutions. This approach is motivated by the observation that a lack of diversity in model outputs can limit the effectiveness of inference-time search for generating correct code.

PLANSEARCH achieves state-of-the-art results on several benchmarks, including HumanEval+, MBPP+, and LiveCodeBench.

🧑🏻‍💻 Researchers predict a startup’s odds of success using data from Crunchbase

The features used to predict a startup’s odds of success

Researchers at LMU Munich created an ML model that can predict whether or not a startup will be successful (IPI, acquisition, external funding). They collected profiles of 20,000 startups from Crunchbase and built predictive models based on features such as founder count, education background, and funding amount.

They concluded that the text descriptions of the startup on Crunchbase was actually the strongest predictor of startup success. The self-description on Crunchbase is written at the discretion of the founders, but it may present information on a startup’s business model, innovation, or market structure.

Age of the startup was the second strongest predictor of success. . On average, a higher age of startups is estimated to have a negative feature attribution to success prediction. This may indicate that startups with a longer market presence face reduced probabilities of success due to, for example, lower perceived growth potential and questionable viability of their business model if they have not yet achieved success.

📱How offensive TikTok content can be detected with ML models

Most common emojis used with offensive TikTok content

Researchers at the Belgian Royal Military Academy collected and analyzed TikTok data containing offensive content, and built a series of ML models for offensive content detection. About 121,000 TikTok comments were collected, and found their models performed quite well with an F1 score of 0.863 (see the Today I Learned section at the bottom to learn about F1 scores).

The research involved creating binary classifiers using BERT, Logistic Regression, and NB BERT and used TikTok’s comments for training.

📸 Extreme Low Light Image Enhancement with Kolmogorov-Arnold Networks (code included)

Pitch black lighting in pitctures can be enhanced to reveal what’s in the dark

Researchers have developed a new way to improve poorly lit, almost pitch black photos.

The team used a new kind of network called Kolmogorov-Arnold networks (KANs). These networks are good at understanding complex relationships in data. They applied this technology to enhance low-light images, which helps overcome some limitations of existing methods.

To make the results look even better to the human eye, they also incorporated techniques that focus on how we perceive different aspects of an image.

When they tested their new method against other approaches using standard datasets, it performed very well. This suggests that their technique could be a promising new way to improve the quality of dark or poorly lit photos.

🧑🏻‍🏫 What is Explainable AI?

Explainability Taxonomy

Explainable AI (XAI) attempts to address the black-box nature of AI models by providing explnations of how they make decisions and predictions, ensuring transparency, accountability, and fairness.

This paper captures the current terminology, research, and applications to help AI developers understand the current state of XAI. It great for those who are looking to enhance the trustworthiness and transparency of their models.

🧠 AI2 releases a new model with 7 billion parameters is the most open model. Weights, data, and code provided.

OLMoE is a new MoE model that has the best price/performance

Researchers at the Allen Institute for AI introduce OLMoE, a state-of-the-art, fully open language model that utilizes a sparse Mixture-of-Experts (MoE) architecture. The model contains 7 billion parameters but uses only 1 billion active parameters per input token due to its MoE design. It is pretrained on 5 trillion tokens and further adapted into OLMoE-1B-7B-Instruct. This model outperforms all other available models with similar active parameters and even surpasses larger models like Llama2-13B-Chat and DeepSeekMoE-16B.

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

Binary Classifier

A binary classifier is a type of machine learning model that categorizes data into one of two possible classes. It is the simplest form of a classifier and is used when the outcome to be predicted is have only one of two possible values (e.g. Yes/No, True/False). For example, is a comment offensive (Yes or No)

F1 Score

The F1 score is a performance metric for classification models that combines two other performance metrics, precision and recall, into a single number between 0 and 1. A higher F1 score indicates better model performance.

HumanEval+

HumanEval+ is an extended benchmark for evaluating the performance of code generation models. It builds upon the original HumanEval benchmark by OpenAI. It aims to provide a more rigorous and thorough evaluation of code generation models, enabling developers to identify strengths and weaknesses in different models.

Kolmogorov-Arnold Networks (KAN)

KANs were introduced in June 2024 by researchers at MIT, CalTech, Northeastern University, and the NSF.

Inspired by the Kolmogorov-Arnold representation theorem, KANs are promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs have fixed activation functions on nodes ("neurons"), KANs have learnable activation functions on edges ("weights").