The Fault in our Stars: Quality Assessment of Code Generation Benchmarks
Large Language Models (LLMs) are gaining popularity among software engineers. A crucial aspect of developing effective code generation LLMs is to evaluate these models using a robust benchmark. Evaluation benchmarks with quality issues can provide a false sense of performance. In this work, we condu...
Saved in:
Published in | Proceedings / IEEE International Working Conference on Source Code Analysis and Manipulation pp. 201 - 212 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
07.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!