[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models
This paper demonstrates a new approach to replaying the Android bugs. More specifically, creates a new tool called AdbGPT to automatedly convert bug reports to reproduction. For the result, AdbGPT is able to reproduce 81.3% of bug reports in 253.6 seconds, outperforming the state-of-the-art baselines and ablation studies.
Background:
Bug reports often go on to contain the steps to reproduce (S2Rs) the bugs that assist developers to replicate and rectify the bugs, albeit with considerable amounts of engineering effort.
The bug reports contain several steps and are difficult for pre-trained models to e the extract the features. It will be suitable for LLMs to handle this with some prior examples.
Implementation:
Typically divided into two phases:
S2R Entity Extraction phase: extract S2R entities defining each step to reproduce the bug report.
- provide examples: an S2R as input, a chain-of-thought as reasoning, and the final entities as the output.
- Input bug reports to query for S2R entities.
Guided Replay phase: match the entities in S2R with the GUI states to repeat the bug reproduction steps.
- GUI encoding: encode the GUI into html form.
- some examples for in-context learning: help AdbGPT understand the meaning of the html tags and help it learn the handling of some missing steps.
- use the ChatGPT model.
- use Genymotion for running and controlling the virtual Android device, Android UIAutomator for dumping the GUI view hierarchy, and Android Debug Bridge (ADB) for replaying the steps.
Evaluation:
RQ1: How accurate is our approach in extracting S2R entities?
RQ2: How accurate is our approach in guiding bug replay?
RQ3: How efficient is our approach in bug replay?
RQ4: How usefulness is our approach for developers in real world bug replay?
For RQ1:
- test the performance of AdbGPT in extracting S2R entities.
- compare it with some baseline tools(ReCDroid, ReCDroid+, MaCa).
- apply ablation study:
- without pre-inputted examples, turn it into zero-shot.
- without intermediate reasons.
For RQ2:
- The same evaluation method as RQ1.
For RQ3:
- calculate the average time it takes for a bug report to pass through each of the two phases.
- 2.6 GHz Macbook Pro with 6 dedicated CPU Intel Core.
- official Android x86-64 emulator.
- The same evaluation method as RQ1.
For RQ4:
- invited experienced Android developer to replay the bug report, recording the time cost.
- replay the bug reports using AdbGPT.
- compare the time.
Future work:
- LLMs for software engineering!!!
[Review] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models