[Review] On the Naturalness of Software

[Review] On the Naturalness of Software

Link here

A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion.

  • Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages.
  • Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens).
  • Regularities are specific to both projects and to application domains.

Implementation & Evaluation:

  • Implement a plug-in n-gram language model, manifesting its effectiveness.
  • Compare with other built-in completion facilities of eclipse.

Future work:

  • More sophisticated language model to do the code prediction(LLM).
  • Not only the code naturalness, but also the deeper properties of software may also have naturalness.



Author

Gax

Posted on

2023-11-14

Updated on

2023-11-15

Licensed under

Comments