A Self-Evaluated Bilingual Automatic Speech Recognition System for Mandarin–English Mixed Conversations

Bilingual communication is increasingly prevalent in this globally connected world, where cultural exchanges and international interactions are unavoidable. Existing automatic speech recognition (ASR) systems are often limited to single languages. However, the growing demand for bilingual ASR in hum...

Full description

Saved in:

Bibliographic Details
Published in	Applied sciences Vol. 15; no. 14; p. 7691
Main Authors	Hai, Xinhe, Aranganadin, Kaviya, Yeh, Cheng-Cheng, Hua, Zhengmao, Huang, Chen-Yun, Hsu, Hua-Yi, Lin, Ming-Chieh
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.07.2025
Subjects	Accuracy Acoustics Algorithms API Applications programming automatic speech recognition bilingual Bilingualism Communication Datasets Dictionaries English language Error analysis Evaluation Human-computer interaction Language Machine learning Mandarin Mandarin–English mixed error rate Multilingualism Performance evaluation Phonetics Self evaluation Speech Speech recognition Speech recognition software Voice recognition United Kingdom Taiwan Washington, D.C
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Bilingual communication is increasingly prevalent in this globally connected world, where cultural exchanges and international interactions are unavoidable. Existing automatic speech recognition (ASR) systems are often limited to single languages. However, the growing demand for bilingual ASR in human–computer interactions, particularly in medical services, has become indispensable. This article addresses this need by creating an application programming interface (API)-based platform using VOSK, a popular open-source single-language ASR toolkit, to efficiently deploy a self-evaluated bilingual ASR system that seamlessly handles both primary and secondary languages in tasks like Mandarin–English mixed-speech recognition. The mixed error rate (MER) is used as a performance metric, and a workflow is outlined for its calculation using the edit distance algorithm. Results show a remarkable reduction in the Mandarin–English MER, dropping from ∼65% to under 13%, after implementing the self-evaluation framework and mixed-language algorithms. These findings highlight the importance of a well-designed system to manage the complexities of mixed-language speech recognition, offering a promising method for building a bilingual ASR system using existing monolingual models. The framework might be further extended to a trilingual or multilingual ASR system by preparing mixed-language datasets and computer development without involving complex training.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app15147691