An Empirical Understanding of Code Clone Detection by ChatGPT
As one of the most popular NLP models recently, ChatGPT has achieved remarkable applications in various NLP tasks. Code clone detection serving as a typical prediction task of software engineering has been studied for years. However, there is a lack of systematic evaluation for the ChatGPT in code c...
Saved in:
Published in | 2023 6th International Conference on Data Science and Information Technology (DSIT) pp. 78 - 83 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
28.07.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | As one of the most popular NLP models recently, ChatGPT has achieved remarkable applications in various NLP tasks. Code clone detection serving as a typical prediction task of software engineering has been studied for years. However, there is a lack of systematic evaluation for the ChatGPT in code clone detection. To fill in this gap, we construct a specific dataset covers multiple types of code data and conduct the first empirical study in the clone detection task for ChatGPT on both source code and binary code. Our study found that ChatGPT can successfully detect the code clones and accurately explain the code semantics for most simple cases. However, in complex binary code scenarios, ChatGPT gets limited performance. Our work shows that ChatGPT has difficulty in identifying the semantics of long assembly code. The results and findings of our research support developers to better apply the big intelligent models to the prediction tasks of software engineering field. |
---|---|
DOI: | 10.1109/DSIT60026.2023.00021 |