An Empirical Understanding of Code Clone Detection by ChatGPT

As one of the most popular NLP models recently, ChatGPT has achieved remarkable applications in various NLP tasks. Code clone detection serving as a typical prediction task of software engineering has been studied for years. However, there is a lack of systematic evaluation for the ChatGPT in code c...

Full description

Saved in:
Bibliographic Details
Published in2023 6th International Conference on Data Science and Information Technology (DSIT) pp. 78 - 83
Main Authors Wang, PeiJie, Zhu, Lu, Wang, Qianlu, Jaiteh, Ousainou, Guo, Chenkai
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:As one of the most popular NLP models recently, ChatGPT has achieved remarkable applications in various NLP tasks. Code clone detection serving as a typical prediction task of software engineering has been studied for years. However, there is a lack of systematic evaluation for the ChatGPT in code clone detection. To fill in this gap, we construct a specific dataset covers multiple types of code data and conduct the first empirical study in the clone detection task for ChatGPT on both source code and binary code. Our study found that ChatGPT can successfully detect the code clones and accurately explain the code semantics for most simple cases. However, in complex binary code scenarios, ChatGPT gets limited performance. Our work shows that ChatGPT has difficulty in identifying the semantics of long assembly code. The results and findings of our research support developers to better apply the big intelligent models to the prediction tasks of software engineering field.
DOI:10.1109/DSIT60026.2023.00021