[Review] On the Naturalness of Software
A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion.
- Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages.
- Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens).
- Regularities are specific to both projects and to application domains.
Implementation & Evaluation:
- Implement a plug-in n-gram language model, manifesting its effectiveness.
- Compare with other built-in completion facilities of eclipse.
Future work:
- More sophisticated language model to do the code prediction(LLM).
- Not only the code naturalness, but also the deeper properties of software may also have naturalness.
[Review] On the Naturalness of Software