<Pinned> How to Read a Paper

<Pinned> How to Read a Paper

It’s a good question that “how to read a paper ?”. After reading some papers I suddenly find myself actually don’t know how to effectively and efficiently read a paper. So I’ll make some collections here on this topic.

You can refer to the links above to get some general ideas (these links were copied from Lingming Zhang’s slides of cs527-s23).

According to William Griswold, try to answer the problem below:

Read more
[Review] Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

[Review] Assisting Static Analysis with Large Language Models: A ChatGPT Experiment

Link

The paper demonstrates the effectiveness of LLM in static analysis.

The most important thing of this paper is the task division and the workflow design. First we need to figure out what the LLM is good at, and assign such tasks to it. What’s more, we need to care about the design of the workflow, which could significantly affect the final result.

Background

Traditional static analysis tools have some shortages. Embedding LLM into the toolchain can help the analysis.

In this paper, Use Before Initialization (UBI) bugs are chosen as the example.

UBITect, which is a tool for UBI bugs, has some shortcomings in detecting, and may discord some cases. LLM can help determine whether these bugs are true bugs.

Read more
[Review] Detecting Missed Security Operations Through Differential Checking of Object-based Similar Paths

[Review] Detecting Missed Security Operations Through Differential Checking of Object-based Similar Paths

Link

Problem: Missing a security operation, such as a bound check.

Traditional Methods: Cross-checking. Locate the potential bugs by exploiting a large number of similar code snippets and compare their patterns.

The paper proposes a new approach to locating bugs, which do not need a large number of cases. Instead, only two code snippets are required. To be specific, object-based similar-path pairs are constructed.

Read more
[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

[Review] GPTScan: Detecting Logic Vulnerabilities in Smart Contracts by Combining GPT with Program Analysis

Link

The paper introduces GPTScan to detect logic bugs in smart contracts. GPTScan combines LLM and traditional static analysis tools to create a new detection tool.

GPTScan depends little on the LLM, which only serves as a role of determining whether the target function has a bug or not. What’s more, the criteria for determining the bug is hand-written. So, only a small part of the tool is composed of LLM.

GPTScan achieves high precision (over 90%) for token contracts and acceptable precision (57.14%) for large projects, as well as a recall of over 70% for detecting ground-truth logic vulnerabilities.

Read more
[Review] MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation

[Review] MoonShine: Optimizing OS Fuzzer Seed Selection with Trace Distillation

Link

The paper proposes the concept of Trace Distillation, that is, to distill or extract the key system calls from the original system call sequence without lowering the coverage, and these distilled sequences will be used as the seed for mutation during fuzzing.

From the distillation process, the dependencies between the system calls will be inferred to help distillation. So actually, the root cause of the speed-up is the dependency inference.

Use static analysis to achieve the seed distillation: inferring both explicit and implicit dependencies between system calls.

MoonShine improved Syzkaller’s test coverage for the Linux kernel by 13% and discovered 17 new previously-undisclosed vulnerabilities in the Linux kernel.

Introduction

Kernel fuzzing, an old topic.

Challenges: dependencies between system calls, kernel states for specific bug triggering.

Existing hand-coded rules are not scalable or effective.

Read more
[Review] One Simple API Can Cause Hundreds of Bugs: An Analysis of Refcounting Bugs in All Modern Linux Kernels

[Review] One Simple API Can Cause Hundreds of Bugs: An Analysis of Refcounting Bugs in All Modern Linux Kernels

Link

The paper mainly focuses on the reference counting(refcounting) bugs in Linux Kernel.

  1. Analyzes the history of 1,033 refcounting bugs in 753 versions of Linux Kernels from 2005 to 2022, and concludes 9 critical rules to check refcounting bugs.
  2. Designs a new tool applying these 9 rules, and detects 351 new bugs, of which 240 are confirmed.

Introduction

Reference counting bugs: the reference count is used to record the reference number of an object(similar to smart pointers in C++).

Potential risks: Memory leakage, UAF.

Read more
[Review] HEALER: Relation Learning Guided Kernel Fuzzing

[Review] HEALER: Relation Learning Guided Kernel Fuzzing

Link

The paper proposes a new technique called relation learning to help infer the relations between system calls when fuzzing the kernel.

Relation learning is achieved by constructing a relation graph, which is a two-dimensional graph with each cell representing the dependencies between two system calls.

The relation graph is built through static and dynamic learning. Static learning will infer the dependencies by analyzing the parameters and the return value of each system call. Dynamic learning will determine the dependencies by analyzing the generated minimized system call sequences.

Read more
[Review] Towards Precise Reporting of Cryptographic Misuses

[Review] Towards Precise Reporting of Cryptographic Misuses

Link here

The paper demonstrates an investigation into Java cryptographic misuse. To be brief, the paper does some research on current misuse detection techniques, analyzing the false positive cases and true positive cases they manifest. The paper discovers the root cause of high false positive rate and invalid true positive cases.

Introduction:

Many cryptographic misuse detection techniques have been proposed but with a high false positive rate. Additionally, many of the misuse alarms might not be very actionable to developers, and previous works might have overestimated the number of misuses and vulnerabilities.

Read more
[Review] How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications

[Review] How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications

Link here

The paper is a evaluation, which assesses the current runtime verification technology, and mainly the effectiveness of the existing API specifications.

Three conclusions:

  1. Current RV technology has matured enough with tolerable runtime overhead.
  2. Existing API specification can find many bugs that developers are willing to fix.
  3. The false alarm rates are quite high due to the ineffective specifications.
Read more
[LeetCode] 1531. String Compression II

[LeetCode] 1531. String Compression II

Link here

I thought it should be Greedy at first, but was stuck in Example 2. Then, I considered about DP, but found it hard to figure out how to build the larger solution from the sub-problem.

So, another time I reached out for solution.

We use dp[i][j] to represent the minimum length of the first i letters after compression and after j deletions.

It obvious that if we delete the i-th letter, and then dp[i][j] = dp[i-1][j-1].

If we don’t, we can scan the former string sequence to try to “combine” the letters.

Read more