A Neural Model for Generating Natural Language Summaries of Program Subroutines

Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built...

Full description

Saved in:
Bibliographic Details
Published inProceedings / International Conference on Software Engineering pp. 795 - 806
Main Authors LeClair, Alexander, Jiang, Siyuan, McMillan, Collin
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2019
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature.
AbstractList Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to automatic documentation generation, program comprehension, and software maintenance. Traditional techniques relied on heuristics and templates built manually by human experts. Recently, data-driven approaches based on neural machine translation have largely overtaken template-based systems. But nearly all of these techniques rely almost entirely on programs having good internal documentation; without clear identifier names, the models fail to create good summaries. In this paper, we present a neural model that combines words from code with code structure from an AST. Unlike previous approaches, our model processes each data source as a separate input, which allows the model to learn code structure independent of the text in code. This process helps our approach provide coherent summaries in many cases even when zero internal documentation is provided. We evaluate our technique with a dataset we created from 2.1m Java methods. We find improvement over two baseline techniques from SE literature and one from NLP literature.
Author McMillan, Collin
Jiang, Siyuan
LeClair, Alexander
Author_xml – sequence: 1
  givenname: Alexander
  surname: LeClair
  fullname: LeClair, Alexander
  organization: University of Notre Dame
– sequence: 2
  givenname: Siyuan
  surname: Jiang
  fullname: Jiang, Siyuan
  organization: Eastern Michigan University
– sequence: 3
  givenname: Collin
  surname: McMillan
  fullname: McMillan, Collin
  organization: University of Notre Dame
BookMark eNotjrtOwzAUhg0CibZ0ZmDxC6Qc23Fsj1VUSqXQIhXm6iQ-iYJyQU4y8PZEwPRL_03fkt10fUeMPQjYCAHu6ZCedxsJwm0AwJortnbGCiOtAJs4e80WQmsbCSn1HVsOw-dcS2LnFuy05UeaAjb8tffU8LIPfE8dBRzrruJHHH_DDLtqwor4eWpbDDUNvC_5W-irgO1s5qGf5gEN9-y2xGag9b-u2Mfz7j19ibLT_pBuswhlbMZIeQfeECKSVWRB6ZlI5bLIdW5ij-ALDaXARCqERDtZal94yBOvfCFKqVbs8e-3JqLLV6hnqu-LtUI4JdUPXCtQjA
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE.2019.00087
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 9781728108698
1728108691
EISSN 1558-1225
EndPage 806
ExternalDocumentID 8811932
Genre orig-research
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a247t-3d90d7eaaae83e80350003b2cb5b74da0dc50f1a623a06592f5dcd0b6d3dc1f23
IEDL.DBID RIE
IngestDate Wed Aug 27 02:46:33 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a247t-3d90d7eaaae83e80350003b2cb5b74da0dc50f1a623a06592f5dcd0b6d3dc1f23
PageCount 12
ParticipantIDs ieee_primary_8811932
PublicationCentury 2000
PublicationDate 2019-May
PublicationDateYYYYMMDD 2019-05-01
PublicationDate_xml – month: 05
  year: 2019
  text: 2019-May
PublicationDecade 2010
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0006499
Score 2.5418408
Snippet Source code summarization -- creating natural language descriptions of source code behavior -- is a rapidly-growing research topic with applications to...
SourceID ieee
SourceType Publisher
StartPage 795
SubjectTerms Algorithms
automatic documentation generation
code comment generation
Documentation
Java
Natural languages
Software engineering
source code summarization
Task analysis
Title A Neural Model for Generating Natural Language Summaries of Program Subroutines
URI https://ieeexplore.ieee.org/document/8811932
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwIxEJ0AJ0-oYPxODx5d6HZbdvdoCASNoAmScCP9vEhYQ5aLv95Od8FoPHhrmibddLadaee9NwB3ec6Q8ugibaWNOBc8yiwTEUbDMmYuNymykaezwWTBn5Zi2YD7AxfGWhvAZ7aHzZDLN4Xe4VNZP8tijDea0PQXt4qrdTh1Bz50r6V7Ypr3H4fzEQK3UI2SZj9rpwTXMW7DdD9phRh57-1K1dOfv_QY__tVx9D9JumR14P7OYGG3ZxCe1-lgdSbtgMvDwQVOOSaYN2zNfFRKqnEphHxTGYyKG-Q5_rhkswDnc1foEnhcAKEb_lOtS38P-oPxi4sxqO34SSqyyhEkvG0jBKTU5NaKaXNEpthKtFvZcW0EirlRlKjBXWx9IGQDFlWJ4w2VA1MYnTsWHIGrU2xsedAZOIdXcz9mFxxb92cpc6yREutNDUivYAOrs_qo1LKWNVLc_l39xUcoYUq-OA1tMrtzt54F1-q22DbL3Bwp7o
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4gHvSECsa3PXi0sNtt93E0BAIKaAIk3Ehfe5GwhiwXf72d3QWj8eBt02zSptN2pp3v-wbgIUkYUh5Tqq20lHPBaWyZoBgNS5-liYmQjTyehIM5f16IRQ0e91wYa20BPrNt_Cxy-SbTW3wq68Sxj_HGARw6vy_8kq21P3dDF7xX4j2-l3SG3WkPoVuoR-nFP6unFM6j34DxrtsSM_Le3uaqrT9_KTL-d1wn0Pqm6ZG3vQM6hZpdn0FjV6eBVNu2Ca9PBDU45Ipg5bMVcXEqKeWmEfNMJrLQ3iCj6umSTAtCm7tCkyzFDhDA5RrVJnOr1B2NLZj3e7PugFaFFKhkPMppYBLPRFZKaePAxphMdJtZMa2EiriRntHCS33pQiFZ5FlTYbTxVGgCo_2UBedQX2drewFEBs7V-dz9kyju7JuwKLUs0FIr7RkRXUIT52f5UWplLKupufq7-R6OBrPxaDkaTl6u4RitVYIJb6Ceb7b21jn8XN0Vdv4CXsOrAw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=A+Neural+Model+for+Generating+Natural+Language+Summaries+of+Program+Subroutines&rft.au=LeClair%2C+Alexander&rft.au=Jiang%2C+Siyuan&rft.au=McMillan%2C+Collin&rft.date=2019-05-01&rft.pub=IEEE&rft.eissn=1558-1225&rft.spage=795&rft.epage=806&rft_id=info:doi/10.1109%2FICSE.2019.00087&rft.externalDocID=8811932