A Neural Model for Generating Natural Language Summaries of Program Subroutines
Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built...
Saved in:
Published in | Proceedings / International Conference on Software Engineering pp. 795 - 806 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature. |
---|---|
AbstractList | Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature. |
Author | McMillan, Collin Jiang, Siyuan LeClair, Alexander |
Author_xml | – sequence: 1 givenname: Alexander surname: LeClair fullname: LeClair, Alexander organization: University of Notre Dame – sequence: 2 givenname: Siyuan surname: Jiang fullname: Jiang, Siyuan organization: Eastern Michigan University – sequence: 3 givenname: Collin surname: McMillan fullname: McMillan, Collin organization: University of Notre Dame |
BookMark | eNotjrtOwzAUhg0CibZ0ZmDxC6Qc23Fsj1VUSqXQIhXm6iQ-iYJyQU4y8PZEwPRL_03fkt10fUeMPQjYCAHu6ZCedxsJwm0AwJortnbGCiOtAJs4e80WQmsbCSn1HVsOw-dcS2LnFuy05UeaAjb8tffU8LIPfE8dBRzrruJHHH_DDLtqwor4eWpbDDUNvC_5W-irgO1s5qGf5gEN9-y2xGag9b-u2Mfz7j19ibLT_pBuswhlbMZIeQfeECKSVWRB6ZlI5bLIdW5ij-ALDaXARCqERDtZal94yBOvfCFKqVbs8e-3JqLLV6hnqu-LtUI4JdUPXCtQjA |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICSE.2019.00087 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 9781728108698 1728108691 |
EISSN | 1558-1225 |
EndPage | 806 |
ExternalDocumentID | 8811932 |
Genre | orig-research |
GroupedDBID | -~X .4S .DC 123 23M 29O 5VS 6IE 6IF 6IH 6IK 6IL 6IM 6IN 8US AAJGR AAWTH ABLEC ADZIZ AFFNX ALMA_UNASSIGNED_HOLDINGS APO ARCSS AVWKF BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO EDO FEDTE I-F I07 IEGSK IJVOP IPLJI M43 OCL RIE RIL RIO RNS XOL |
ID | FETCH-LOGICAL-a247t-3d90d7eaaae83e80350003b2cb5b74da0dc50f1a623a06592f5dcd0b6d3dc1f23 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:46:33 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a247t-3d90d7eaaae83e80350003b2cb5b74da0dc50f1a623a06592f5dcd0b6d3dc1f23 |
PageCount | 12 |
ParticipantIDs | ieee_primary_8811932 |
PublicationCentury | 2000 |
PublicationDate | 2019-May |
PublicationDateYYYYMMDD | 2019-05-01 |
PublicationDate_xml | – month: 05 year: 2019 text: 2019-May |
PublicationDecade | 2010 |
PublicationTitle | Proceedings / International Conference on Software Engineering |
PublicationTitleAbbrev | ICSE |
PublicationYear | 2019 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0006499 |
Score | 2.5418408 |
Snippet | Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 795 |
SubjectTerms | Algorithms automatic documentation generation code comment generation Documentation Java Natural languages Software engineering source code summarization Task analysis |
Title | A Neural Model for Generating Natural Language Summaries of Program Subroutines |
URI | https://ieeexplore.ieee.org/document/8811932 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwIxEJ0AJ0-oYPxODx5d6HZbdvdoCASNoAmScCP9vEhYQ5aLv95Od8FoPHhrmibddLadaee9NwB3ec6Q8ugibaWNOBc8yiwTEUbDMmYuNymykaezwWTBn5Zi2YD7AxfGWhvAZ7aHzZDLN4Xe4VNZP8tijDea0PQXt4qrdTh1Bz50r6V7Ypr3H4fzEQK3UI2SZj9rpwTXMW7DdD9phRh57-1K1dOfv_QY__tVx9D9JumR14P7OYGG3ZxCe1-lgdSbtgMvDwQVOOSaYN2zNfFRKqnEphHxTGYyKG-Q5_rhkswDnc1foEnhcAKEb_lOtS38P-oPxi4sxqO34SSqyyhEkvG0jBKTU5NaKaXNEpthKtFvZcW0EirlRlKjBXWx9IGQDFlWJ4w2VA1MYnTsWHIGrU2xsedAZOIdXcz9mFxxb92cpc6yREutNDUivYAOrs_qo1LKWNVLc_l39xUcoYUq-OA1tMrtzt54F1-q22DbL3Bwp7o |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4gHvSECsa3PXi0sNtt93E0BAIKaAIk3Ehfe5GwhiwXf72d3QWj8eBt02zSptN2pp3v-wbgIUkYUh5Tqq20lHPBaWyZoBgNS5-liYmQjTyehIM5f16IRQ0e91wYa20BPrNt_Cxy-SbTW3wq68Sxj_HGARw6vy_8kq21P3dDF7xX4j2-l3SG3WkPoVuoR-nFP6unFM6j34DxrtsSM_Le3uaqrT9_KTL-d1wn0Pqm6ZG3vQM6hZpdn0FjV6eBVNu2Ca9PBDU45Ipg5bMVcXEqKeWmEfNMJrLQ3iCj6umSTAtCm7tCkyzFDhDA5RrVJnOr1B2NLZj3e7PugFaFFKhkPMppYBLPRFZKaePAxphMdJtZMa2EiriRntHCS33pQiFZ5FlTYbTxVGgCo_2UBedQX2drewFEBs7V-dz9kyju7JuwKLUs0FIr7RkRXUIT52f5UWplLKupufq7-R6OBrPxaDkaTl6u4RitVYIJb6Ceb7b21jn8XN0Vdv4CXsOrAw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=A+Neural+Model+for+Generating+Natural+Language+Summaries+of+Program+Subroutines&rft.au=LeClair%2C+Alexander&rft.au=Jiang%2C+Siyuan&rft.au=McMillan%2C+Collin&rft.date=2019-05-01&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=795&rft.epage=806&rft_id=info:doi/10.1109%2FICSE.2019.00087&rft.externalDocID=8811932 |