Franc: A Lightweight Framework for High-Quality Code Generation

In recent years, the use of automated source code generation utilizing transformer-based generative models has grown in popularity. These models can generate code according to the developers' requirements. However, recent research showed that these automatically generated source codes can conta...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE International Working Conference on Source Code Analysis and Manipulation pp. 106 - 117
Main Authors	Siddiq, Mohammed Latif, Casey, Beatrice, Santos, Joanna C. S.
Format	Conference Proceeding
Language	English
Published	IEEE 07.10.2024
Subjects	Analytical models code generation code quality code security Codes Java Large language models Maintenance engineering Prompt engineering Source coding Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, the use of automated source code generation utilizing transformer-based generative models has grown in popularity. These models can generate code according to the developers' requirements. However, recent research showed that these automatically generated source codes can contain vulnerabilities and other quality issues. Despite researchers' and practitioners' attempts to enhance code generation models, retraining and fine-tuning large language models is not only time-consuming but also resource-intensive and costly. Thus, in this paper, we describe FRANC, a lightweight framework for recommending more secure and high-quality source code derived from transformer-based code generation models. FRANC includes a static filter to make the generated code compilable with heuristics and a quality-aware ranker to sort the code snippets based on a quality score. Moreover, the framework uses prompt engineering to fix persistent quality issues. We evaluated FRANC with five Python and Java code generation models and six prompt datasets, including a newly created one in this work (FRANC). The static filter improves 9% to 46% Java suggestions and 10% to 43% Python suggestions regarding compilability. The average improvement over the NDCG@10 score for the ranking system is 0.0763, and the repairing techniques repair the highest 80% of prompts. FRANC takes, on average, 1.98 seconds for Java; for Python, it takes 0.08 seconds.
ISSN:	2470-6892
DOI:	10.1109/SCAM63643.2024.00020