[Review] Examining Zero-Shot Vulnerability Repair with Large Language Models

[Review] Examining Zero-Shot Vulnerability Repair with Large Language Models

Link here

The paper tests the performance of LLM for program repair. The same topic as Automated Program Repair in the Era of Large Pre-trained Language Models. Differently, this paper focuses more on the details, whose program repair setting is much more complicated.

Some conclusions were drawn:

  • LLMs can generate fixes to bugs.
  • But for real-world settings, the performance is not enough.

Background:

  • Security bugs are significant.
  • LLMs are popular and has outstanding performance.

Implementation:

RQ1: Can off-the-shelf LLMs generate safe and functional code to fix security vulnerabilities?

RQ2: Does varying the amount of context in the comments of a prompt affect the LLM’s ability to suggest fixes?

RQ3: What are the challenges when using LLMs to fix vulnerabilities in the real world?

RQ4: How reliable are LLMs at generating repairs?

  • apply different LLMs: code-cushman-001, code-davinci-001, code-davinci-002, j1-large, j1-jumbo, gpt2-csrc(self-trained), polycoder.

  • synthetic experimentation

    • synthesize buggy programs.
      • manually write the starting part of the program
      • apply LLMs to generate the whole program
      • the generated program itself may be valid, compilable, vulnerable, functional or safe.
    • test the influence of different parameters(temperature and top_p).
    • apply LLMs to repair the generated but vulnerable programs.
    • evaluate the performance.
    • not every time the more specific prompt will achieve a better performance, but the more specific one has the better performance on average.
    • The OpenAI Codex models consistently outperform the other models with regards to generating successful patches.(which means Codex may be a quite good tool for program generation.)
  • test on repairing hardware design languages(e.g., verilog)
    • LLMs were less proficient at producing Verilog code than they were at C or Python.
  • real-world bugs



[Review] Examining Zero-Shot Vulnerability Repair with Large Language Models

https://gax-c.github.io/blog/2023/11/21/24_paper_review_14/

Author

Gax

Posted on

2023-11-21

Updated on

2023-11-22

Licensed under

Comments