Enhancing english oral translation through cross-modal learning and synchronous optimization

Oral translation in English serves as a critical conduit for international communication and cultural exchange. However, the prevalent variations in pronunciation and the rapid pace of spoken language currently impede the efficacy of synchronous translation methods. To improve the quality and effici...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 20; no. 8; p. e0329381
Main Author	Wang, Yan
Format	Journal Article
Language	English
Published	United States Public Library of Science 18.08.2025 Public Library of Science (PLoS)
Subjects	Accuracy Algorithms Analysis Attention Bilingualism Biology and Life Sciences Computational linguistics Engineering and Technology English language Language processing Linear transformations Machine translation Memory Natural language interfaces Neural networks Optimization Probability distribution Real time Semantics Social Sciences Speech Synchronization Time synchronization Translating and interpreting Translation United Kingdom
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Oral translation in English serves as a critical conduit for international communication and cultural exchange. However, the prevalent variations in pronunciation and the rapid pace of spoken language currently impede the efficacy of synchronous translation methods. To improve the quality and efficiency of synchronous oral translation, this paper explores the integration of cross-modal semantic understanding and synchronous enhancement specifically for English oral translation. This exploration commences with the implementation of a cross-modal translation scenario. Subsequently, the text sequence derived from this process is amalgamated with the original speech features via Bidirectional Encoder Representations from Transformers (BERT). The cross-information between modalities is explored, and linear transformation optimization is performed on the self-attention mechanism in Transformer to achieve context-awareness and understanding of oral-transcribed text. In conclusion, the integration of dynamic time warping (DTW) enhances real-time synchronization between speech and text, thereby improving translation fluency. Experimental results reveal that, when compared to the existing bilingual attention neural machine translation (NMT) model and the context-aware NMT model, the model proposed in this study yields an average bilingual evaluation understudy (BLEU) score that is 9.3% and 26.9% higher, respectively. Furthermore, its synchronization speed surpasses that of the other two models by 17.9% and 16.8%, respectively. These findings suggest that the fusion model, which incorporates context-awareness and an attention mechanism in cross-modal translation, can significantly elevate the quality and efficiency of English oral translation, offering a novel approach to the synchronous translation of spoken English.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0329381