Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks

Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and code comments generation. Recent studies in the Natural Language Processing (NLP) field have shown that...

Full description

Saved in:
Bibliographic Details
Published inProceedings / International Conference on Software Engineering pp. 336 - 347
Main Authors Mastropaolo, Antonio, Scalabrino, Simone, Cooper, Nathan, Nader Palacio, David, Poshyvanyk, Denys, Oliveto, Rocco, Bavota, Gabriele
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and code comments generation. Recent studies in the Natural Language Processing (NLP) field have shown that the Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is to first pre-train a model on a large and generic dataset using a self-supervised task (e.g., filling masked words in sentences). Once the model is pre-trained, it is fine-tuned on smaller and specialized datasets, each one related to a specific task (e.g., language translation, sentence classification). In this paper, we empirically investigate how the T5 model performs when pre-trained and fine-tuned to support code-related tasks. We pre-train a T5 model on a dataset composed of natural language English text and source code. Then, we fine-tune such a model by reusing datasets used in four previous works that used DL techniques to: (i) fix bugs, (ii) inject code mutants, (iii) generate assert statements, and (iv) generate code comments. We compared the performance of this single model with the results reported in the four original papers proposing DL-based solutions for those four tasks. We show that our T5 model, exploiting additional data for the self-supervised pre-training phase, can achieve performance improvements over the four baselines.
AbstractList Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related tasks, such as automatic bug fixing and code comments generation. Recent studies in the Natural Language Processing (NLP) field have shown that the Text-To-Text Transfer Transformer (T5) architecture can achieve state-of-the-art performance for a variety of NLP tasks. The basic idea behind T5 is to first pre-train a model on a large and generic dataset using a self-supervised task (e.g., filling masked words in sentences). Once the model is pre-trained, it is fine-tuned on smaller and specialized datasets, each one related to a specific task (e.g., language translation, sentence classification). In this paper, we empirically investigate how the T5 model performs when pre-trained and fine-tuned to support code-related tasks. We pre-train a T5 model on a dataset composed of natural language English text and source code. Then, we fine-tune such a model by reusing datasets used in four previous works that used DL techniques to: (i) fix bugs, (ii) inject code mutants, (iii) generate assert statements, and (iv) generate code comments. We compared the performance of this single model with the results reported in the four original papers proposing DL-based solutions for those four tasks. We show that our T5 model, exploiting additional data for the self-supervised pre-training phase, can achieve performance improvements over the four baselines.
Author Poshyvanyk, Denys
Scalabrino, Simone
Cooper, Nathan
Bavota, Gabriele
Oliveto, Rocco
Nader Palacio, David
Mastropaolo, Antonio
Author_xml – sequence: 1
  givenname: Antonio
  surname: Mastropaolo
  fullname: Mastropaolo, Antonio
  organization: Università della Svizzera italiana (USI), Switzerland
– sequence: 2
  givenname: Simone
  surname: Scalabrino
  fullname: Scalabrino, Simone
  organization: University of Molise, Italy
– sequence: 3
  givenname: Nathan
  surname: Cooper
  fullname: Cooper, Nathan
  organization: William and Mary, USA
– sequence: 4
  givenname: David
  surname: Nader Palacio
  fullname: Nader Palacio, David
  organization: William and Mary, USA
– sequence: 5
  givenname: Denys
  surname: Poshyvanyk
  fullname: Poshyvanyk, Denys
  organization: William and Mary, USA
– sequence: 6
  givenname: Rocco
  surname: Oliveto
  fullname: Oliveto, Rocco
  organization: University of Molise, Italy
– sequence: 7
  givenname: Gabriele
  surname: Bavota
  fullname: Bavota, Gabriele
  organization: Università della Svizzera italiana (USI), Switzerland
BookMark eNotj81qAjEURgO1ULU-QbvIC8Tm5m-SZRlsFYRCHdcSJ3fsUJ3IJEJ9-06p3-ac1YFvQkZd7JCQZ-BzAO5eVuVmoaTjYi64gDnnXMEdmYAxWnHhjBiRMWhtGQihH8gspXbPlSoccKPGZLvJl3BtuwPNX0i3yR-QxoZW-JNZFdkfadX7LjXY3yT2p8FzpJvL-Rz7TMsYkH3i0WcMtPLpOz2S-8YfE85unJLt26Iql2z98b4qX9fMSw2ZmcaaYRiw0H4vpFRGWDT7UATwonFGWq2kqjXUEJwdXmCtvA2iMFzJIOWUPP13W0Tcnfv25PvrzikOzgr5C4nhUnQ
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICSE43902.2021.00041
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 347
ExternalDocumentID 9401982
Genre orig-research
GrantInformation_xml – fundername: European Research Council
  funderid: 10.13039/501100000781
GroupedDBID -~X
.4S
.DC
123
23M
29O
5VS
6IE
6IF
6IH
6IK
6IL
6IM
6IN
8US
AAJGR
AAWTH
ABLEC
ADZIZ
AFFNX
ALMA_UNASSIGNED_HOLDINGS
APO
ARCSS
AVWKF
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
EDO
FEDTE
I-F
I07
IEGSK
IJVOP
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
XOL
ID FETCH-LOGICAL-a351t-6f86666ede75ab2334628e6bd7d1a2f96385434c51c1d98558ec4a8d276043d33
IEDL.DBID RIE
ISBN 1665402962
9781665402965
ISSN 1558-1225
IngestDate Wed Aug 27 02:21:08 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a351t-6f86666ede75ab2334628e6bd7d1a2f96385434c51c1d98558ec4a8d276043d33
PageCount 12
ParticipantIDs ieee_primary_9401982
PublicationCentury 2000
PublicationDate 2021-05
PublicationDateYYYYMMDD 2021-05-01
PublicationDate_xml – month: 05
  year: 2021
  text: 2021-05
PublicationDecade 2020
PublicationTitle Proceedings / International Conference on Software Engineering
PublicationTitleAbbrev ICSE
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib044791064
ssj0006499
Score 2.6031892
Snippet Deep learning (DL) techniques are gaining more and more attention in the software engineering community. They have been used to support several code-related...
SourceID ieee
SourceType Publisher
StartPage 336
SubjectTerms Computer bugs
Deep learning
Empirical software engineering
Filling
Natural language processing
Software
Software engineering
Task analysis
Title Studying the Usage of Text-To-Text Transfer Transformer to Support Code-Related Tasks
URI https://ieeexplore.ieee.org/document/9401982
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT4NAEN7UnjxVbY3v7MGjtOyDXfbctKkmGhNp0lsDu8ulSTGWXvz1zixQo_HgiYEEAsPsPOD7Zgi51zrx2hoX2QJWk3RORgWTYMtlqbC_U2rKAJB9UYulfFolqx55OHBhvPcBfObHKIZ_-a6ye_xUNjFQDJgUHO4RFG4NV6uzHSk1BD5M_VsvrGSYHQnhEqokMFokdeGk3ZgbxdteT91-0nLqWGwmj9O3GQTpQNLiLDTzZD8mr4TAMx-Q5-6WG7zJZryvi7H9_NXN8b_PdEJG3xQ_-noIXqek57dnZNDNeKDtkh-SJQINkQpFIVOkS4Sh0aqkGRbMWRXhloZ4BxdtBciCQa4rihNDIbun08r5KKDuvKNZvtvsRmQ5n2XTRdSOYohykbA6UmUKdY7yzuskL7gQSGn1qnDasZyXuIqRo2oTZpkzKSjdW5mnjmsVS-GEOCf9bbX1F4QqnwhuwbeUWkqex6bg4OlE7LhlBs6-JEPU0vq96baxbhV09ffha3KM76mBIN6Qfv2x97eQJtTFXbCPL3QLs1A
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV07T8MwELYqGGDi0SLeeIAxJXZsJx6YCqilpUIildhKYjsLUoNoKgS_hb_Cf-POTUEgViSmOJESKb7zPezvuyPkOI6li422gclhNQlrRZAzAbpcFArrOyW68ADZoeqOxNWdvGuQt08ujHPOg89cG4f-LN-WZoZbZacakgGd8BpC2Xcvz5CgTc965yDNE84vL9JON6h7CARZJFkVqCKBAF0562KZ5TyKkIvpVG5jyzJeoPohudJIZpjViZSJMyJLLI9VKCKL251g4JchzpB8zg5baKsQMbhaTDZqu6-E71YJDhryMlgmSCPD3r4h14rX1aUW97Jm8bFQn_Y6txcQFnhaGGe-fCj71uvFu7rLNfK-mKQ5wuWhPavytnn9UT_yv87iOml9kRjpzad73iANN9kka4suFrQ2ak0yQiglkr0oxMJ0hEA7WhY0xS2BtAzwSr1Hh4_WA4jzYVyVFHuiQv5CO6V1gccVOkvTbPowbZHRn_ziFlmalBO3TahyMuIGrGcRC8GzUOccbHkUWm6Yhrd3SBOlMn6c1xMZ1wLZ_f3xEVnppteD8aA37O-RVdSROeBynyxVTzN3AEFRlR963aTk_q_F-AEh_w6J
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+International+Conference+on+Software+Engineering&rft.atitle=Studying+the+Usage+of+Text-To-Text+Transfer+Transformer+to+Support+Code-Related+Tasks&rft.au=Mastropaolo%2C+Antonio&rft.au=Scalabrino%2C+Simone&rft.au=Cooper%2C+Nathan&rft.au=Nader+Palacio%2C+David&rft.date=2021-05-01&rft.pub=IEEE&rft.isbn=9781665402965&rft.issn=1558-1225&rft.spage=336&rft.epage=347&rft_id=info:doi/10.1109%2FICSE43902.2021.00041&rft.externalDocID=9401982
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-1225&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-1225&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-1225&client=summon