[Review] Automated Program Repair in the Era of Large Pre-trained Language Models

[Review] Automated Program Repair in the Era of Large Pre-trained Language Models

Link here

The paper presents the first extensive evaluation of recent LLMs for fixing real-world projects. It evaluates the effectiveness of the Automated Program Repair(ARP) in the era of LLMs.

Several conclusions were drawn:

  • As we increase the size of the model, we also increase in the number of correct and plausible patches generated.
  • Successfully utilizing the code after the buggy lines is important for fixing bugs.
  • While LLMs have the capability to perform fault localization and repair in one shot, for real world software systems, it is still more cost-effective to first use traditional fault localization techniques to pinpoint the precise bug locations and then leverage LLMs for more targeted patch generation.
  • By directly applying LLMs for APR without any specific change/finetuning, we can already achieve the highest number of correct fixes compared to existing baselines.
  • Entropy computation via LLMs can help distinguish correct patches from plausible patches.
  • Sum entropy performs slightly better compared to mean entropy.
Read more
[Review] Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

[Review] Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm

Link here

The paper discusses about prompt engineering, mainly focusing on GPT-3. It compiles some prompt engineering approaches.

Background:

The recent rise of massive self-supervised language models such as GPT-3 arises the interests of prompt engineering. For such models, 0-shot prompts may significantly outperform few-shot prompts. So, the importance of prompt engineering is again being promoted.

Some facts:

  • 0-shot may outperform few-shot: instead of treating examples as a categorical guide, it is inferred that their semantic meaning is relevant to the task.
  • For GPT-3, its resemblance not to a single human author but a superposition of authors.
Read more
[Review] Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models

[Review] Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models

Link here

The paper proposes a new approach to leveraging LLMs to generate input programs for fuzzing DL libraries. More specifically, apply LLMs(Codex & INCODER) to fuzz DL libraries(pytorch & tensorflow).

Background:

  • Previous work on fuzzing DL libraries mainly falls into two categories: API-level fuzzing and model-level fuzzing. They still have some limitations.
  • Model level fuzzers attempt to leverage complete DL models (which cover various sets of DL library APIs) as test inputs. But due to the input/output constraints of DL APIs, model-level mutation/generation is hard to perform, leading to a limited number of unique APIs covered.
  • API-level fuzzing focuses on finding bugs within a single API at a time. But API-level fuzzers cannot detect any bug that arises from interactions within a complex API sequence.
Read more
[Review] On the Naturalness of Software

[Review] On the Naturalness of Software

Link here

A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion.

  • Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages.
  • Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens).
  • Regularities are specific to both projects and to application domains.
Read more
[Review] Titan : Efficient Multi-target Directed Greybox Fuzzing

[Review] Titan : Efficient Multi-target Directed Greybox Fuzzing

Link here

The paper presents a multi-target fuzzing method, which fuzzes different targets at the same time.

Titan is proposed to perform this work, enabling the fuzzers to distinguish correlations between various targets in the program. And under these correlations, optimizes the input generation efficiently and simultaneously fuzzing different targets.

repo

Introduction:

In practice, more than 1000 potential targets may need verification, which will be costly. Current direct fuzzing only aims at on target at a time, lowering the verification efficiency, and generating multiple instances for fuzzing multiple targets will also be 3.6x slower compared with sequentially applying only one instance at a time for one target.

Read more
[Review] Whole Test Suite Generation

[Review] Whole Test Suite Generation

Link here

The paper presents a Genetic Algorithm(GA) in which whole test suites are evolved with the aim of covering all coverage goals at the same time.

  • Whole test suite generation achieves higher coverage than single branch test case generation.
  • Whole test suite generation produces smaller test suites than single branch test case generation.

http://www.evosuite.org

Background:

  • Current work only target at one coverage goal at a time.
  • Engineers should manually write assertion for every test case, so the length of the test case should be as short as possible(after satisfying the coverage prerequisite).
Read more
[Review] Feedback-directed Random Test Generation

[Review] Feedback-directed Random Test Generation

Link here

The paper presents a technique to improve random test generation by incorporating feedback obtained from executing test inputs as they are created.

This paper aims to exposing the potential faults in objects(e.g., Java class), i.e., object oriented, by generating a sequence of method calls to explore bugs.

Background

Random testing is of low efficiency, and may generate useless and redundant test sequences. So RANDOOP is proposed to handle this problem.

Implementation

  • Randomly select some method sequences that have been checked with no error.
Read more
[Review] autofz: Automated Fuzzer Composition at Runtime

[Review] autofz: Automated Fuzzer Composition at Runtime

Link here

This paper proposes a new fuzzing mechanism which integrates several fuzzers to perform a unique fuzzing process. For every workload, one or several optimal mixture of fuzzers are employed for fuzzing. Unlike the early work, autofz:

  1. Do not need presetting and human efforts.
  2. Allocate fuzzers for every workload, rather than every program.

Background:

  • A large amount of fuzzers have been created, which makes it difficult to choose a proper fuzzer for a specific fuzzing.
  • No universal fuzzer perpetually outperforms others, so choosing a optimal fuzzer will be difficult.
  • The efficiency of a fuzzer may not last for the whole fuzzing process.
  • Fuzzing is a random process, a optimal fuzzer may not always be that case.
Read more
Software Analysis Basics

Software Analysis Basics

Background and Basics

  • Test oracle: a mechanism for determining whether software executed correctly for a test.

  • Differential test: Provide the same input to similar applications, and observe output differences.

  • Metamorphic testing: Provide the manipulated inputs to same application, and observe if output differences are as expected.

Program Analysis Basics

  • Abstract syntax tree(AST): Represents the abstract syntactic structure of a language construct.

  • Control flow graph(CFG):

    • Divide the program into basic blocks.
    • Basic blocks: A sequence of straight-line code that can be entered only at the beginning and exited at the end.
    • Connect basic blocks together to generate CFG.
Read more
[Review] A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware

[Review] A Large-Scale Empirical Analysis of the Vulnerabilities Introduced by Third-Party Components in IoT Firmware

Link here

This paper doesn’t propose anything new, but creates a system called FirmSec that can detect the TPCs(third-part components) at version-level in firmware, and then recognizes the corresponding vulnerabilities. FirmSec takes IoT firmware images as input and output the vulnerabilities of TPCs contained in the firmware image.

Also, their work creates a database consisting of 34, 136 firmware images. FirmSecDataset

Implementation:

  • Preprocess the database, gathering various firmware images both public and private.

  • Preprocess the database, gathering various TPCs and their vulnerabilities.

  • Take in the firmware image, identify its characters and determines the TPCs(at version level) contained in the firmware.

  • Generate the vulnerability report of the firmware.

Read more