[Review] Automated Program Repair in the Era of Large Pre-trained Language Models
The paper presents the first extensive evaluation of recent LLMs for fixing real-world projects. It evaluates the effectiveness of the Automated Program Repair(ARP) in the era of LLMs.
Several conclusions were drawn:
- As we increase the size of the model, we also increase in the number of correct and plausible patches generated.
- Successfully utilizing the code after the buggy lines is important for fixing bugs.
- While LLMs have the capability to perform fault localization and repair in one shot, for real world software systems, it is still more cost-effective to first use traditional fault localization techniques to pinpoint the precise bug locations and then leverage LLMs for more targeted patch generation.
- By directly applying LLMs for APR without any specific change/finetuning, we can already achieve the highest number of correct fixes compared to existing baselines.
- Entropy computation via LLMs can help distinguish correct patches from plausible patches.
- Sum entropy performs slightly better compared to mean entropy.