[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

Link

The paper introduces GPTScan to detect logic bugs in smart contracts. GPTScan combines LLM and traditional static analysis tools to create a new detection tool.

GPTScan depends little on the LLM, which only serves as a role of determining whether the target function has a bug or not. What’s more, the criteria for determining the bug is hand-written. So, only a small part of the tool is composed of LLM.

GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects, as well as a recall of over 70% for detecting ground-truth logic vulnerabilities.

Read more
[Review] Towards Precise Reporting of Cryptographic Misuses

[Review] Towards Precise Reporting of Cryptographic Misuses

Link here

The paper demonstrates an investigation into Java cryptographic misuse. To be brief, the paper does some research on current misuse detection techniques, analyzing the false positive cases and true positive cases they manifest. The paper discovers the root cause of high false positive rate and invalid true positive cases.

Introduction:

Many cryptographic misuse detection techniques have been proposed but with a high false positive rate. Additionally, many of the misuse alarms might not be very actionable to developers, and previous works might have overestimated the number of misuses and vulnerabilities.

Read more
[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting

[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting

Link here

The paper explores the ability of ChatGPT(not LLMs, only ChatGPT) to find failure-inducing tests, and proposes a new method called Differential Prompting to do it. It can achieve a success rate of 75% for programs of QuixBugs and 66.7% of programs of Codeforces.

This approach may only be useful in some small scale programs(less than 100 LOC).

Background:

Failure-inducing tests is some testcases that can trigger bugs of the specific program. Finding such tests is a main objective in software engineering, but challenging in practice.

Recently, applying LLMs(e.g., ChatGPT) for software engineering has become popular, but directly apply ChatGPT to this task may be challenging and has a bad performance. Cause ChatGPT is insensitive to nuances(i.e., subtle differences between two similar sequence to tokens). So, it’s challenging for ChatGPT to find identify bugs because a bug is essentially a nuance between a buggy program and its fixed version.

Read more
[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models

Link here

This paper demonstrates a new approach to replaying the Android bugs. More specifically, creates a new tool called AdbGPT to automatedly convert bug reports to reproduction. For the result, AdbGPT is able to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies.

Background:

Bug reports often go on to contain the steps to reproduce (S2Rs) the bugs that assist developers to replicate and rectify the bugs, albeit with considerable amounts of engineering effort.

Read more
[Review] Titan : Efficient Multi-target Directed Greybox Fuzzing

[Review] Titan : Efficient Multi-target Directed Greybox Fuzzing

Link here

The paper presents a multi-target fuzzing method, which fuzzes different targets at the same time.

Titan is proposed to perform this work, enabling the fuzzers to distinguish correlations between various targets in the program. And under these correlations, optimizes the input generation efficiently and simultaneously fuzzing different targets.

repo

Introduction:

In practice, more than 1000 potential targets may need verification, which will be costly. Current direct fuzzing only aims at on target at a time, lowering the verification efficiency, and generating multiple instances for fuzzing multiple targets will also be 3.6x slower compared with sequentially applying only one instance at a time for one target.

Read more