Evaluating Fault Localization and Program Repair Capabilities of Existing Closed-Source General-Purpose LLMs

Automated debugging is an emerging research field that aims to automatically find and repair bugs. In this field, Fault Localization (FL) and Automated Program Repair (APR) gain the most research efforts. Most recently, researchers have adopted pre-trained Large Language Models (LLMs) to facilitate...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) pp. 75 - 78
Main Authors	Jiang, Shengbei, Zhang, Jiabao, Chen, Wei, Wang, Bo, Zhou, Jianyi, Zhang, Jie M.
Format	Conference Proceeding
Language	English
Published	ACM 20.04.2024
Subjects	Chatbots Computer bugs Debugging Fault Localization Large Language Model Large language models Location awareness Maintenance engineering Program Repair Software Software Debugging Software engineering Software testing Sparks
Online Access	Get full text
DOI	10.1145/3643795.3648390

Cover

More Information
Summary:	Automated debugging is an emerging research field that aims to automatically find and repair bugs. In this field, Fault Localization (FL) and Automated Program Repair (APR) gain the most research efforts. Most recently, researchers have adopted pre-trained Large Language Models (LLMs) to facilitate FL and APR and their results are promising. However, the LLMs they used either vanished (such as Codex) or outdated (such as early versions of GPT). In this paper, we evaluate the performance of recent commercial closed-source general-purpose LLMs on FL and APR, i.e., ChatGPT 3.5, ERNIE Bot 3.5, and IFlytek Spark 2.0. We select three popular LLMs and evaluate them on 120 real-world Java bugs from the benchmark Defects4J. For FL and APR, we designed three kinds of prompts for each, considering different kinds of information. The results show that these LLMs could successfully locate 53.3% and correctly fix 12.5% of these bugs.CCS CONCEPTS* Software and its engineering → Search-based software engineering; Software testing and debugging.
DOI:	10.1145/3643795.3648390