The Fault in our Stars: Quality Assessment of Code Generation Benchmarks

Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we condu...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE International Working Conference on Source Code Analysis and Manipulation pp. 201 - 212
Main Authors	Siddiq, Mohammed Latif, Dristi, Simantika, Saha, Joy, Santos, Joanna C. S.
Format	Conference Proceeding
Language	English
Published	IEEE 07.10.2024
Subjects	Benchmark testing benchmarks code generation Codes Contamination data contamination data quality Documentation Java Large language models Python Quality assessment Software Source coding
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!