[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

Link

The paper introduces GPTScan to detect logic bugs in smart contracts. GPTScan combines LLM and traditional static analysis tools to create a new detection tool.

GPTScan depends little on the LLM, which only serves as a role of determining whether the target function has a bug or not. What’s more, the criteria for determining the bug is hand-written. So, only a small part of the tool is composed of LLM.

GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects, as well as a recall of over 70% for detecting ground-truth logic vulnerabilities.

Read more
[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

Link here

This paper demonstrates a new approach to replaying the Android bugs. More specifically, creates a new tool called AdbGPT to automatedly convert bug reports to reproduction. For the result, AdbGPT is able to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies.

Background:

Bug reports often go on to contain the steps to reproduce (S2Rs) the bugs that assist developers to replicate and rectify the bugs, albeit with considerable amounts of engineering effort.

Read more
[Review] Automated Program Repair in the Era of Large Pre-trained Language Models

[Review] Automated Program Repair in the Era of Large Pre-trained Language Models

Link here

The paper presents the first extensive evaluation of recent LLMs for fixing real-world projects. It evaluates the effectiveness of the Automated Program Repair(ARP) in the era of LLMs.

Several conclusions were drawn:

  • As we increase the size of the model, we also increase in the number of correct and plausible patches generated.
  • Successfully utilizing the code after the buggy lines is important for fixing bugs.
  • While LLMs have the capability to perform fault localization and repair in one shot, for real world software systems, it is still more cost-effective to first use traditional fault localization techniques to pinpoint the precise bug locations and then leverage LLMs for more targeted patch generation.
  • By directly applying LLMs for APR without any specific change/finetuning, we can already achieve the highest number of correct fixes compared to existing baselines.
  • Entropy computation via LLMs can help distinguish correct patches from plausible patches.
  • Sum entropy performs slightly better compared to mean entropy.
Read more
[Review] On the Naturalness of Software

[Review] On the Naturalness of Software

Link here

A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion.

  • Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages.
  • Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens).
  • Regularities are specific to both projects and to application domains.
Read more
[Review] Feedback-directed Random Test Generation

[Review] Feedback-directed Random Test Generation

Link here

The paper presents a technique to improve random test generation by incorporating feedback obtained from executing test inputs as they are created.

This paper aims to exposing the potential faults in objects(e.g., Java class), i.e., object oriented, by generating a sequence of method calls to explore bugs.

Background

Random testing is of low efficiency, and may generate useless and redundant test sequences. So RANDOOP is proposed to handle this problem.

Implementation

  • Randomly select some method sequences that have been checked with no error.
Read more