Enhancing english oral translation through cross-modal learning and synchronous optimization

Oral translation in English serves as a critical conduit for international communication and cultural exchange. However, the prevalent variations in pronunciation and the rapid pace of spoken language currently impede the efficacy of synchronous translation methods. To improve the quality and effici...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 20; no. 8; p. e0329381
Main Author Wang, Yan
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 18.08.2025
Public Library of Science (PLoS)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Oral translation in English serves as a critical conduit for international communication and cultural exchange. However, the prevalent variations in pronunciation and the rapid pace of spoken language currently impede the efficacy of synchronous translation methods. To improve the quality and efficiency of synchronous oral translation, this paper explores the integration of cross-modal semantic understanding and synchronous enhancement specifically for English oral translation. This exploration commences with the implementation of a cross-modal translation scenario. Subsequently, the text sequence derived from this process is amalgamated with the original speech features via Bidirectional Encoder Representations from Transformers (BERT). The cross-information between modalities is explored, and linear transformation optimization is performed on the self-attention mechanism in Transformer to achieve context-awareness and understanding of oral-transcribed text. In conclusion, the integration of dynamic time warping (DTW) enhances real-time synchronization between speech and text, thereby improving translation fluency. Experimental results reveal that, when compared to the existing bilingual attention neural machine translation (NMT) model and the context-aware NMT model, the model proposed in this study yields an average bilingual evaluation understudy (BLEU) score that is 9.3% and 26.9% higher, respectively. Furthermore, its synchronization speed surpasses that of the other two models by 17.9% and 16.8%, respectively. These findings suggest that the fusion model, which incorporates context-awareness and an attention mechanism in cross-modal translation, can significantly elevate the quality and efficiency of English oral translation, offering a novel approach to the synchronous translation of spoken English.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: The authors have declared that no competing interests exist.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0329381