[Review] On the Naturalness of Software
A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion.
- Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages.
- Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens).
- Regularities are specific to both projects and to application domains.