Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention

Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) lay...

Full description

Saved in:

Bibliographic Details
Published in	2023 8th International Conference on Computers and Devices for Communication (CODEC) pp. 1 - 2
Main Authors	Samui, Suman, Garai, Soumen
Format	Conference Proceeding
Language	English
Published	IEEE 14.12.2023
Subjects	Audio Spectrogram Transformer Mean square error methods Multi-head Attention Performance evaluation Speech coding Speech enhancement Speech processing Speech recognition Time-frequency analysis Transformer Transformers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications.
DOI:	10.1109/CODEC60112.2023.10465846