Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention
Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) lay...
Saved in:
Published in | 2023 8th International Conference on Computers and Devices for Communication (CODEC) pp. 1 - 2 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
14.12.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications. |
---|---|
DOI: | 10.1109/CODEC60112.2023.10465846 |