Completing Function Documentation Comments Using Structural Information

Source code comments are a cornerstone of software documentation facilitating feature development and maintenance. Well-defined documentation formats, like Javadoc, make it easy to include structural metadata used to, for example, generate documentation manuals. However, the actual usage of structur...

Full description

Saved in:

Bibliographic Details
Published in	Empirical software engineering : an international journal Vol. 28; no. 4; p. 86
Main Authors	Ciurumelea, Adelina, Alexandru, Carol V., Gall, Harald C., Proksch, Sebastian
Format	Journal Article
Language	English
Published	New York Springer US 01.07.2023 Springer Nature B.V
Subjects	Compilers Computer Science Documentation Empirical analysis Interpreters Programming Languages Software Engineering/Programming and Operating Systems Source code Structural members Comment completion Javadocs Python documentation strings Neural language models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Source code comments are a cornerstone of software documentation facilitating feature development and maintenance. Well-defined documentation formats, like Javadoc, make it easy to include structural metadata used to, for example, generate documentation manuals. However, the actual usage of structural elements in source code comments has not been studied yet. We investigate to which extent these structural elements are used in practice and whether the added information can be leveraged to improve tools assisting developers when writing comments. Existing research on comment generation traditionally focuses on automatic generation of summaries. However, recent works have shown promising results when supporting comment authoring through a next-word prediction. In this paper, we present an in-depth analysis of commenting practice in more than 18K open-source projects written in Python and Java showing that many structural elements, particularly parameter and return value descriptions are indeed widely used. We discover that while a majority are rather short at about 6 to 9 words, many are several hundred words in length. We further find that Python comments tend to be significantly longer than Java comments, possibly due to the weakly-typed nature of the former. Following the empirical analysis, we extend an existing language model with support for structural information, substantially improving the Top-1 accuracy of predicted words (Python 9.6%, Java 7.8%).
ISSN:	1382-3256 1573-7616
DOI:	10.1007/s10664-022-10284-6