Towards Efficient Models for Real-Time Deep Noise Suppression

With recent research advancements, deep learning models are be-coming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality and background noise reduction, the main challenge is to ob...

Full description

Saved in:
Bibliographic Details
Published inICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 656 - 660
Main Authors Braun, Sebastian, Gamper, Hannes, Reddy, Chandan K.A., Tashev, Ivan
Format Conference Proceeding
LanguageEnglish
Published IEEE 06.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With recent research advancements, deep learning models are be-coming attractive and powerful choices for speech enhancement in real-time applications. While state-of-the-art models can achieve outstanding results in terms of speech quality and background noise reduction, the main challenge is to obtain compact enough models, which are resource efficient during inference time. An important but often neglected aspect for data-driven methods is that results can be only convincing when tested on real-world data and evaluated with useful metrics. In this work, we investigate reasonably small recurrent and convolutional-recurrent network architectures for speech enhancement, trained on a large dataset considering also reverberation. We show interesting tradeoffs between computational complexity and the achievable speech quality, measured on real recordings using a highly accurate MOS estimator. It is shown that the achievable speech quality is a function of network complexity, and show which models have better tradeoffs.
ISSN:2379-190X
DOI:10.1109/ICASSP39728.2021.9413580