[Review] Nuances are the Key: Unlocking ChatGPT to Find Failure-Inducing Tests with Differential Prompting
The paper explores the ability of ChatGPT(not LLMs, only ChatGPT) to find failure-inducing tests, and proposes a new method called Differential Prompting to do it. It can achieve a success rate of 75% for programs of QuixBugs and 66.7% of programs of Codeforces.
This approach may only be useful in some small scale programs(less than 100 LOC).
Background:
Failure-inducing tests is some testcases that can trigger bugs of the specific program. Finding such tests is a main objective in software engineering, but challenging in practice.
Recently, applying LLMs(e.g., ChatGPT) for software engineering has become popular, but directly apply ChatGPT to this task may be challenging and has a bad performance. Cause ChatGPT is insensitive to nuances(i.e., subtle differences between two similar sequence to tokens). So, it’s challenging for ChatGPT to find identify bugs because a bug is essentially a nuance between a buggy program and its fixed version.