[Review] How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications

[Review] How Good Are the Specs? A Study of the Bug-Finding Effectiveness of Existing Java API Specifications

Link here

The paper is a evaluation, which assesses the current runtime verification technology, and mainly the effectiveness of the existing API specifications.

Three conclusions:

  1. Current RV technology has matured enough with tolerable runtime overhead.
  2. Existing API specification can find many bugs that developers are willing to fix.
  3. The false alarm rates are quite high due to the ineffective specifications.
Read more
[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting

[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting

Link here

The paper explores the ability of ChatGPT(not LLMs, only ChatGPT) to find failure-inducing tests, and proposes a new method called Differential Prompting to do it. It can achieve a success rate of 75% for programs of QuixBugs and 66.7% of programs of Codeforces.

This approach may only be useful in some small scale programs(less than 100 LOC).

Background:

Failure-inducing tests is some testcases that can trigger bugs of the specific program. Finding such tests is a main objective in software engineering, but challenging in practice.

Recently, applying LLMs(e.g., ChatGPT) for software engineering has become popular, but directly apply ChatGPT to this task may be challenging and has a bad performance. Cause ChatGPT is insensitive to nuances(i.e., subtle differences between two similar sequence to tokens). So, it’s challenging for ChatGPT to find identify bugs because a bug is essentially a nuance between a buggy program and its fixed version.

Read more