AugmentedCode: Examining the Effects of Natural Language Resources in Code Retrieval Models

Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from searching snippet codes to function codes. In this paper, we i...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Bahrami, Mehdi, Shrikanth, N C, Mizobuchi, Yuji, Liu, Lei, Fukuyori, Masahiro, Wei-Peng, Chen, Munakata, Kazuki
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 16.10.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Code retrieval is allowing software engineers to search codes through a natural language query, which relies on both natural language processing and software engineering techniques. There have been several attempts on code retrieval from searching snippet codes to function codes. In this paper, we introduce Augmented Code (AugmentedCode) retrieval which takes advantage of existing information within the code and constructs augmented programming language to improve the code retrieval models' performance. We curated a large corpus of Python and showcased the the framework and the results of augmented programming language which outperforms on CodeSearchNet and CodeBERT with a Mean Reciprocal Rank (MRR) of 0.73 and 0.96, respectively. The outperformed fine-tuned augmented code retrieval model is published in HuggingFace at https://huggingface.co/Fujitsu/AugCode and a demonstration video is available at: https://youtu.be/mnZrUTANjGs .
ISSN:2331-8422