[Review] On the Naturalness of Software

[Review] On the Naturalness of Software

Link here

A classical paper showing software also has its own naturalness like natural languages, demonstrating the basics of programming prediction and completion.

  • Natural languages are repetitive and predictable, which can be processed by statistical approaches(NLP). Programming code is also very regular, and even more so than natural languages.
  • Demonstrate, using standard cross-entropy and perplexity measures, that the above model is indeed capturing the high-level statistical regularity that exists in software at the n-gram level (probabilistic chains of tokens).
  • Regularities are specific to both projects and to application domains.
Read more