INTERTRANS: Leveraging Transitive Intermediate Translations to Enhance LLM-Based Code Translation

Code translation aims to convert a program from one programming language (PL) to another. This long-standing software engineering task is crucial for modernizing legacy systems, ensuring cross-platform compatibility, enhancing performance, and more. However, automating this process remains challengi...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / International Conference on Software Engineering pp. 1153 - 1164
Main Authors	Macedo, Marcos, Tian, Yuan, Nie, Pengyu, Cogo, Filipe R., Adams, Bram
Format	Conference Proceeding
Language	English
Published	IEEE 26.04.2025
Subjects	Accuracy automated code translation Benchmark testing Codes Computer languages intermediate representation Large language models LLM Multilingual Semantics Software engineering Syntactics Translation tree of code translation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Code translation aims to convert a program from one programming language (PL) to another. This long-standing software engineering task is crucial for modernizing legacy systems, ensuring cross-platform compatibility, enhancing performance, and more. However, automating this process remains challenging due to many syntactic and semantic differences between PLs. Recent studies show that even advanced techniques such as large language models (LLMs), especially open-source LLMs, still struggle with the task. Currently, code LLMs are trained with source code from multiple programming languages, thus presenting multilingual capabilities. In this paper, we investigate whether such capabilities can be harnessed to enhance code translation. To achieve this goal, we introduce INTERTRANS, an LLM-based automated code translation approach that, in contrast to existing approaches, leverages intermediate translations to bridge the syntactic and semantic gaps between source and target PLs. INTERTRANS contains two stages. It first utilizes a novel Tree of Code Translation (ToCT) algorithm to plan transitive intermediate translation sequences between a given source and target PL, then validates them in a specific order. We evaluate INTERTRANS with three open LLMs on three benchmarks (i.e., CodeNet, HumanEval-X, and TransCoder) involving six PLs. Results show an absolute improvement of 18.3% to 43.3% in Computation Accuracy (CA) for INTERTRANS over Direct Translation with 10 attempts. The best-performing variant of INTERTRANS (with the Magicoder LLM) achieved an average CA of 87.3%-95.4% on three benchmarks.
ISSN:	1558-1225
DOI:	10.1109/ICSE55347.2025.00236