Posted 2024-06-04Updated 2024-06-05Research2 minutes read (About 323 words)

[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

The paper introduces GPTScan to detect logic bugs in smart contracts. GPTScan combines LLM and traditional static analysis tools to create a new detection tool.

GPTScan depends little on the LLM, which only serves as a role of determining whether the target function has a bug or not. What’s more, the criteria for determining the bug is hand-written. So, only a small part of the tool is composed of LLM.

GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects, as well as a recall of over 70% for detecting ground-truth logic vulnerabilities.

Posted 2023-11-22Updated 2023-11-23Research3 minutes read (About 418 words)

[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

Link here

This paper demonstrates a new approach to replaying the Android bugs. More specifically, creates a new tool called AdbGPT to automatedly convert bug reports to reproduction. For the result, AdbGPT is able to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies.

Background:

Bug reports often go on to contain the steps to reproduce (S2Rs) the bugs that assist developers to replicate and rectify the bugs, albeit with considerable amounts of engineering effort.

Posted 2023-11-19Updated 2023-11-19Research3 minutes read (About 427 words)

[Review] Automated Program Repair in the Era of Large Pre-trained Language Models

Link here

The paper presents the first extensive evaluation of recent LLMs for fixing real-world projects. It evaluates the effectiveness of the Automated Program Repair(ARP) in the era of LLMs.

Several conclusions were drawn:

As we increase the size of the model, we also increase in the number of correct and plausible patches generated.
Successfully utilizing the code after the buggy lines is important for fixing bugs.
While LLMs have the capability to perform fault localization and repair in one shot, for real world software systems, it is still more cost-effective to first use traditional fault localization techniques to pinpoint the precise bug locations and then leverage LLMs for more targeted patch generation.
By directly applying LLMs for APR without any specific change/finetuning, we can already achieve the highest number of correct fixes compared to existing baselines.
Entropy computation via LLMs can help distinguish correct patches from plausible patches.
Sum entropy performs slightly better compared to mean entropy.

Posted 2023-11-14Updated 2023-11-15Researcha minute read (About 149 words)

[Review] On the Naturalness of Software

Link here

A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion.

Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages.
Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens).
Regularities are specific to both projects and to application domains.

Posted 2023-11-09Updated 2023-11-12Researcha minute read (About 185 words)

[Review] Feedback-directed Random Test Generation

Link here

The paper presents a technique to improve random test generation by incorporating feedback obtained from executing test inputs as they are created.

This paper aims to exposing the potential faults in objects(e.g., Java class), i.e., object oriented, by generating a sequence of method calls to explore bugs.

Background

Random testing is of low efficiency, and may generate useless and redundant test sequences. So RANDOOP is proposed to handle this problem.

Implementation