Hedgecode: A Multi-Task Hedging Contrastive Learning Framework for Code Search

Code search is a vital activity in software engineering, focused on identifying and retrieving the correct code snippets based on a query provided in natural language. Approaches based on deep learning techniques have been increasingly adopted for this task, enhancing the initial representations of...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / International Conference on Software Engineering pp. 1857 - 1868
Main Authors	Chen, Gong, Xie, Xiaoyuan, Tang, Daniel, Xin, Qi, Liu, Wenjie
Format	Conference Proceeding
Language	English
Published	IEEE 26.04.2025
Subjects	Benchmark testing code search Codes Contrastive learning Deep learning multi-task learning Multitasking Natural languages relevance detection Semantics Software engineering Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Code search is a vital activity in software engineering, focused on identifying and retrieving the correct code snippets based on a query provided in natural language. Approaches based on deep learning techniques have been increasingly adopted for this task, enhancing the initial representations of both code and its natural language descriptions. Despite this progress, there remains an unexplored gap in ensuring consistency between the representation spaces of code and its descriptions. Furthermore, existing methods have not fully leveraged the potential relevance between code snippets and their descriptions, presenting a challenge in discerning fine-grained semantic distinctions among similar code snippets. To address these challenges, we introduce a multi-task hedging contrastive Learning framework for Code Search, referred to as HedgeCode. HedgeCode is structured around two primary training phases. The first phase, known as the representation alignment stage, proposes a hedging contrastive learning approach. This method aims to detect subtle differences between code and natural language text, thereby aligning their representation spaces by identifying relevance. The subsequent phase involves multi-task joint learning, wherein the previously trained model serves as the encoder. This stage optimizes the model through a combination of supervised and self-supervised contrastive learning tasks. Our framework's effectiveness is demonstrated through its performance on the CodeSearchNet benchmark, showcasing HedgeCode's ability to address the mentioned limitations in code search tasks.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00008