[#18] How to detect LLM-generated code

Plus PatentGPT can generate patents automatically, NVIDIA defends RAG vs long-context windows, facial recognition on pixelated images, and why AI agents perform poorly

Hello readers, in this issue we cover

  • Detecting LLM-generated code by identifying the unique characteristics they create such as # of code lines

  • PatentGPT can automatically generate patent drafts

  • NVIDIA defends RAG in the era of long-context windows

  • Facial recognition on low-resolution images

  • Why AI-agents perform poorly on real-world application

👨🏼‍💻 How to detect code generated by an LLM

Importance of features such as average code lines and blank lines. LLMs generate code with unique characteristics, which can be detected.

AI-native code editors like Cursor and Github’s Copilot are gaining in popularity. With the rise in LLM-generated code, there will also be a rise in defective, vulnerable, or suboptimal code.

Researchers used 22 software features such as code lines, cyclomatic complexity, and comment lines and built ML models to identify which features of code snippets make them more detectible by ML models.

Claude 3 was used as a test case, and found that it tends to generate longer functions, but shorter classes, than humans. This unique characteristic can be used to detect Claude 3 generated code with up to 82%.

🖋️ PatentGPT is an LLM that can generate patent documents

Turning ideas into a patent abstract

PatentGPT is an LLM that can generate drafts of patents. The authors write it was developed to achieve automatic IP knowledge mining and IP creation, and is the first large-scale language model tailored for US patent applications. It added that the system can assist researchers and junior engineers in writing patents, improving the efficiency and accuracy of the patent creation process.

The researchers used knowledge fine-tuning of scientific texts to improve the performance of small-parameter language models, and designed PatentGPT that can automatically mine knowledge and convert it into intellectual property. The model is improved through RLHF to ensure the generated content conforms to human knowledge.

🪟 NVIDIA Defends RAG in the age of long context windows

OP-RAG outperforms vanilla RAG for long context LLMs

Many researchers suggest Retrieval Augmented Generation (RAG) will become obsolete when context windows become large enough. However, this paper from NVIDIA researchers defend the use of RAG and finds that extremely long context in LLMs suffers from potential degradation in answer quality.

The researchers also introduce a technique called Order Preserved Retrieval Augmented Generation, OP-RAG, which produces high quality answers with much less tokens than the long context LLMs.

👨🏼‍🦲 Improving Facial Recognition with Low Resolution Images

Facial recognition works well with clear, high-quality images, but it struggles with low-resolution ones due to missing details.

Researchers propose a new method that combines two techniques: "knowledge distillation" and "adaptation." Knowledge distillation is similar to a student learning from a teacher. Here, a "student" model learns from a "teacher" model that is trained on high-resolution images. This helps the student model get better at recognizing faces in low-resolution images by learning what’s important from the high-quality ones.

However, the problem is that the student model often doesn't perform well when faced with new images that look different from the training images. To fix this, the researchers add an "adaptation" step, where the model learns to adjust to new situations more effectively. They use something called "adaptive batch normalization," which helps the model better handle different types of images during testing.

Their experiments show that this method works better than existing ones, making it a significant improvement for low-resolution face recognition technology.

🤖 Why are AI agents poor performers in real-world applications?

The grounding pipeline shows for an AI agent to execute steps in the browser

This paper dives into why AI agent’s performance in real world applications in poor. The researchers distinguish between 2 primary components to AI agents to automate tasks - planning and grounding. Planning includes finding the optimal steps to execute a task, while grounding includes how the agent interacts with its environment such as web components.

It finds that planning is a bottleneck i.e. agents are poor at planning their tasks out. Grounding is not a bottleneck. More specifically, agents struggle to make accurate decisions when selecting which web element to interact with.

🤯 Today I Learned

Every issue, we highlight new AI concepts and terminology to help educate our readers. This issue we learned about:

Adaptive Batch Normalization

Adaptive Batch Normalization helps a computer model work better when it encounters new or different types of data compared to what it was trained on. Normally, models use "batch normalization" to standardize their learning process based on training data. However, if the test data looks different, the model might struggle. Adaptive Batch Normalization fixes this by adjusting its learning based on the new data, making the model more flexible and accurate in real-world situations.

Mind2Web

A dataset for developing and evaluating generalist agents for the web that can follow language instructions to complete complex tasks on any website.

Planning

In the context of AI agents, planning refers to the agent’s ability to determine the appropriate sequence of actions to accomplish a given task.

This may include in-context planning, where agents adjust to tasks within a given context, and fine-tuned approaches that tailor agent behavior through training.

Grounding

In the context of AI agents, grounding involves correctly identifying and interacting with relevant web elements based on these decisions.