The Fault in our Stars: Quality Assessment of Code Generation Benchmarks

Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we condu...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Working Conference on Source Code Analysis and Manipulation pp. 201 - 212
Main Authors Siddiq, Mohammed Latif, Dristi, Simantika, Saha, Joy, Santos, Joanna C. S.
Format Conference Proceeding
LanguageEnglish
Published IEEE 07.10.2024
Subjects
Online AccessGet full text

Cover

Loading…