Noise-Robust Automatic Speech Recognition: A Case Study for Communication Interference
An Automatic Speech Recognition (ASR) System is a software tool that converts a speech audio waveform into its corresponding text transcription. ASR systems are usually built using Artificial Intelligence techniques, particularly Machine Learning algorithms like Deep Learning, to address the multi-f...
Saved in:
Published in | Journal on Interactive Systems Vol. 15; no. 1; pp. 670 - 681 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Brazilian Computer Society
09.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | An Automatic Speech Recognition (ASR) System is a software tool that converts a speech audio waveform into its corresponding text transcription. ASR systems are usually built using Artificial Intelligence techniques, particularly Machine Learning algorithms like Deep Learning, to address the multi-faceted complexity and variability of human speech. This allows these systems to learn from extensive speech datasets, adapt to several languages and accents, and continuously improve their performance over time, making them each time more versatile and effective in their purpose of transcribing spoken language to text. Much in the same way, we argue that the noises commonly present in the different environments also need to be explicitly dealt with, and, when possible, modeled within specific datasets with proper training. Our motivation comes from the observation that noise removal techniques (commonly called denoising), are not always fully (and generically) efficient. For instance, noise degeneration due to communication interference, which is almost always present in radio transmissions, has peculiarities that a simple mathematical formulation cannot model. This work presents a modeling technique composed of an augmented dataset-building approach and a profile identifier that can be used to build ASRs for noisy environments that perform similarly to those used in noise-free environments. As a case study, we developed a specific ASR for the interference noise in radio transmissions with its specific dataset, while comparing our results with other state-of-the-art work. As a result, we report a Character Error Rate value of 0.3163 for the developed ASR under several different noise conditions. |
---|---|
ISSN: | 2763-7719 2763-7719 |
DOI: | 10.5753/jis.2024.4267 |