[Review] Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

[Review] Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

Link

The paper demonstrates the effectiveness of LLM in static analysis.

The most important thing of this paper is the task division and the workflow design. First we need to figure out what the LLM is good at, and assign such tasks to it. What’s more, we need to care about the design of the workflow, which could significantly affect the final result.

Background

Traditional static analysis tools have some shortages. Embedding LLM into the toolchain can help the analysis.

In this paper, Use Before Initialization (UBI) bugs are chosen as the example.

UBITect, which is a tool for UBI bugs, has some shortcomings in detecting, and may discord some cases. LLM can help determine whether these bugs are true bugs.

Read more
[Review] Detecting Missed Security Operations Through Differential Checking of Object-based Similar Paths

[Review] Detecting Missed Security Operations Through Differential Checking of Object-based Similar Paths

Link

Problem: Missing a security operation, such as a bound check.

Traditional Methods: Cross-checking. Locate the potential bugs by exploiting a large number of similar code snippets and compare their patterns.

The paper proposes a new approach to locating bugs, which do not need a large number of cases. Instead, only two code snippets are required. To be specific, object-based similar-path pairs are constructed.

Read more
[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

Link

The paper introduces GPTScan to detect logic bugs in smart contracts. GPTScan combines LLM and traditional static analysis tools to create a new detection tool.

GPTScan depends little on the LLM, which only serves as a role of determining whether the target function has a bug or not. What’s more, the criteria for determining the bug is hand-written. So, only a small part of the tool is composed of LLM.

GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects, as well as a recall of over 70% for detecting ground-truth logic vulnerabilities.

Read more
[Review] MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation

[Review] MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation

Link

The paper proposes the concept of Trace Distillation, that is, to distill or extract the key system calls from the original system call sequence without lowering the coverage, and these distilled sequences will be used as the seed for mutation during fuzzing.

From the distillation process, the dependencies between the system calls will be inferred to help distillation. So actually, the root cause of the speed-up is the dependency inference.

Use static analysis to achieve the seed distillation: inferring both explicit and implicit dependencies between system calls.

MoonShine improved Syzkaller’s test coverage for the Linux kernel by 13% and discovered 17 new previously-undisclosed vulnerabilities in the Linux kernel.

Introduction

Kernel fuzzing, an old topic.

Challenges: dependencies between system calls, kernel states for specific bug triggering.

Existing hand-coded rules are not scalable or effective.

Read more
[Review] One Simple API Can Cause Hundreds of Bugs: An Analysis of Refcounting Bugs in All Modern Linux Kernels

[Review] One Simple API Can Cause Hundreds of Bugs: An Analysis of Refcounting Bugs in All Modern Linux Kernels

Link

The paper mainly focuses on the reference counting(refcounting) bugs in Linux Kernel.

  1. Analyzes the history of 1,033 refcounting bugs in 753 versions of Linux Kernels from 2005 to 2022, and concludes 9 critical rules to check refcounting bugs.
  2. Designs a new tool applying these 9 rules, and detects 351 new bugs, of which 240 are confirmed.

Introduction

Reference counting bugs: the reference count is used to record the reference number of an object(similar to smart pointers in C++).

Potential risks: Memory leakage, UAF.

Read more
[Review] HEALER: Relation Learning Guided Kernel Fuzzing

[Review] HEALER: Relation Learning Guided Kernel Fuzzing

Link

The paper proposes a new technique called relation learning to help infer the relations between system calls when fuzzing the kernel.

Relation learning is achieved by constructing a relation graph, which is a two-dimensional graph with each cell representing the dependencies between two system calls.

The relation graph is built through static and dynamic learning. Static learning will infer the dependencies by analyzing the parameters and the return value of each system call. Dynamic learning will determine the dependencies by analyzing the generated minimized system call sequences.

Read more
[Review] Towards Precise Reporting of Cryptographic Misuses

[Review] Towards Precise Reporting of Cryptographic Misuses

Link here

The paper demonstrates an investigation into Java cryptographic misuse. To be brief, the paper does some research on current misuse detection techniques, analyzing the false positive cases and true positive cases they manifest. The paper discovers the root cause of high false positive rate and invalid true positive cases.

Introduction:

Many cryptographic misuse detection techniques have been proposed but with a high false positive rate. Additionally, many of the misuse alarms might not be very actionable to developers, and previous works might have overestimated the number of misuses and vulnerabilities.

Read more
[Review] How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications

[Review] How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications

Link here

The paper is a evaluation, which assesses the current runtime verification technology, and mainly the effectiveness of the existing API specifications.

Three conclusions:

  1. Current RV technology has matured enough with tolerable runtime overhead.
  2. Existing API specification can find many bugs that developers are willing to fix.
  3. The false alarm rates are quite high due to the ineffective specifications.
Read more
[Review] Python Crypto Misuses in the Wild

[Review] Python Crypto Misuses in the Wild

Link here

The paper conducts a study on Python crypto API misuses. A tool called LICMA is implemented aiming at detecting crypto API misuses in python.

Several conclusions:

  1. 52.26 % of the Python projects using crypto APIs contain at least a potential misuse.
  2. Only 14.81 % of the projects directly contain a misuse of a crypto API. The rest is introduced through third-party code.
  3. Most Python applications are more secure compared with C or Java, and the distribution between the concrete types of misuses differ a lot.
Read more
[Review] Evaluation of Static Vulnerability Detection Tools with Java Cryptographic API Benchmarks

[Review] Evaluation of Static Vulnerability Detection Tools with Java Cryptographic API Benchmarks

Link here

The paper assesses the performance of the current static vulnerability detection tools in the era of Java cryptographic API misuse.

Main contributions:

  • provide two benchmarks: CryptoAPI-Bench, ApacheCryptoAPI-Bench.
    • CryptoAPI-Bench consists of 181 test cases covering 16 types of Cryptographic and SSL/TLS API misuse vulnerabilities, with basic level and advanced level.
    • ApacheCryptoAPI-Bench documents the API misuse vulnerabilities from 10 real-world Apache projects. This benchmark is for checking the scalability(the ability to induce low computational overhead to analyze large code-bases) of the detection tool.
  • evaluate four static analysis tools based on the two proposed benchmarks: specialized tools(CryptoGuard, CrySL), general purpose tools(SpotBugs, Coverity).
Read more