Noise-Robust Automatic Speech Recognition: A Case Study for Communication Interference

An Automatic Speech Recognition (ASR) System is a software tool that converts a speech audio waveform into its corresponding text transcription. ASR systems are usually built using Artificial Intelligence techniques, particularly Machine Learning algorithms like Deep Learning, to address the multi-f...

Full description

Saved in:

Bibliographic Details
Published in	Journal on Interactive Systems Vol. 15; no. 1; pp. 670 - 681
Main Authors	Duarte, Julio Cesar, Colcher, Sérgio
Format	Journal Article
Language	English
Published	Brazilian Computer Society 09.07.2024
Subjects	Automatic Speech Recognition Systems Noise Robustness Portuguese ASRs
Online Access	Get full text

Cover

Loading…

More Information
Summary:	An Automatic Speech Recognition (ASR) System is a software tool that converts a speech audio waveform into its corresponding text transcription. ASR systems are usually built using Artificial Intelligence techniques, particularly Machine Learning algorithms like Deep Learning, to address the multi-faceted complexity and variability of human speech. This allows these systems to learn from extensive speech datasets, adapt to several languages and accents, and continuously improve their performance over time, making them each time more versatile and effective in their purpose of transcribing spoken language to text. Much in the same way, we argue that the noises commonly present in the different environments also need to be explicitly dealt with, and, when possible, modeled within specific datasets with proper training. Our motivation comes from the observation that noise removal techniques (commonly called denoising), are not always fully (and generically) efficient. For instance, noise degeneration due to communication interference, which is almost always present in radio transmissions, has peculiarities that a simple mathematical formulation cannot model. This work presents a modeling technique composed of an augmented dataset-building approach and a profile identifier that can be used to build ASRs for noisy environments that perform similarly to those used in noise-free environments. As a case study, we developed a specific ASR for the interference noise in radio transmissions with its specific dataset, while comparing our results with other state-of-the-art work. As a result, we report a Character Error Rate value of 0.3163 for the developed ASR under several different noise conditions.
ISSN:	2763-7719 2763-7719
DOI:	10.5753/jis.2024.4267