Enhancing english oral translation through cross-modal learning and synchronous optimization
Oral translation in English serves as a critical conduit for international communication and cultural exchange. However, the prevalent variations in pronunciation and the rapid pace of spoken language currently impede the efficacy of synchronous translation methods. To improve the quality and effici...
Saved in:
Published in | PloS one Vol. 20; no. 8; p. e0329381 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
United States
Public Library of Science
18.08.2025
Public Library of Science (PLoS) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Oral translation in English serves as a critical conduit for international communication and cultural exchange. However, the prevalent variations in pronunciation and the rapid pace of spoken language currently impede the efficacy of synchronous translation methods. To improve the quality and efficiency of synchronous oral translation, this paper explores the integration of cross-modal semantic understanding and synchronous enhancement specifically for English oral translation. This exploration commences with the implementation of a cross-modal translation scenario. Subsequently, the text sequence derived from this process is amalgamated with the original speech features via Bidirectional Encoder Representations from Transformers (BERT). The cross-information between modalities is explored, and linear transformation optimization is performed on the self-attention mechanism in Transformer to achieve context-awareness and understanding of oral-transcribed text. In conclusion, the integration of dynamic time warping (DTW) enhances real-time synchronization between speech and text, thereby improving translation fluency. Experimental results reveal that, when compared to the existing bilingual attention neural machine translation (NMT) model and the context-aware NMT model, the model proposed in this study yields an average bilingual evaluation understudy (BLEU) score that is 9.3% and 26.9% higher, respectively. Furthermore, its synchronization speed surpasses that of the other two models by 17.9% and 16.8%, respectively. These findings suggest that the fusion model, which incorporates context-awareness and an attention mechanism in cross-modal translation, can significantly elevate the quality and efficiency of English oral translation, offering a novel approach to the synchronous translation of spoken English. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist. |
ISSN: | 1932-6203 1932-6203 |
DOI: | 10.1371/journal.pone.0329381 |