Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention

Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) lay...

Full description

Saved in:
Bibliographic Details
Published in2023 8th International Conference on Computers and Devices for Communication (CODEC) pp. 1 - 2
Main Authors Samui, Suman, Garai, Soumen
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.12.2023
Subjects
Online AccessGet full text
DOI10.1109/CODEC60112.2023.10465846

Cover

Loading…
Abstract Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications.
AbstractList Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications.
Author Samui, Suman
Garai, Soumen
Author_xml – sequence: 1
  givenname: Suman
  surname: Samui
  fullname: Samui, Suman
  email: samuisuman@gmail.com
  organization: National Institute of Technology,Department of Electronics and Communication Engineering,Durgapur,West Bengal,India,713209
– sequence: 2
  givenname: Soumen
  surname: Garai
  fullname: Garai, Soumen
  email: soumengoroi@gmail.com
  organization: National Institute of Technology,Department of Electronics and Communication Engineering,Durgapur,West Bengal,India,713209
BookMark eNo1kM1OAjEURmuiC0XewEVfYLB3-gNdkgHUBMLC2ZPS3mEamBY7nRDeXoy6-hbn5Cy-J3IfYkBCKLAJANOv1XaxrBQDKCclK_kEmFByJtQdGeupnnHJOExhqh7JpfYdFquEXwMGe6WL2Bkf6OcZ0bZ0GVoTLHYYMl0l0-ElpiMdeh8OdD44H39Em1M83CCtkwl9E1OHiV58bunG9Ed0dDOcsi9aNI7Oc761fAzP5KExpx7Hfzsi9WpZV-_Fevv2Uc3XhQfQuVBMATo2E9qUIJiyRhnHJHNWCNTGuL2USpZWSa4brdHsAbDUlnMQjjV8RF5-sx4Rd-fkO5Ouu_87-Df8oFxj
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CODEC60112.2023.10465846
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798350317176
EndPage 2
ExternalDocumentID 10465846
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i119t-6061ed0849a21406ca6ad050dc44e9aadb55652c6539f99eab11e29c3314d0f3
IEDL.DBID RIE
IngestDate Wed May 01 11:50:10 EDT 2024
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i119t-6061ed0849a21406ca6ad050dc44e9aadb55652c6539f99eab11e29c3314d0f3
PageCount 2
ParticipantIDs ieee_primary_10465846
PublicationCentury 2000
PublicationDate 2023-Dec.-14
PublicationDateYYYYMMDD 2023-12-14
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-Dec.-14
  day: 14
PublicationDecade 2020
PublicationTitle 2023 8th International Conference on Computers and Devices for Communication (CODEC)
PublicationTitleAbbrev CODEC
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.8565527
Snippet Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms Audio Spectrogram Transformer
Mean square error methods
Multi-head Attention
Performance evaluation
Speech coding
Speech enhancement
Speech processing
Speech recognition
Time-frequency analysis
Transformer
Transformers
Title Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention
URI https://ieeexplore.ieee.org/document/10465846
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA62J08qVnyTg9dsk33ZHEvbpQitghV6K3lMbCm2pe4i-uvN7HYVBcFTlhA2IQOZ-ZLvmyHkJtVR4lzKmU0kZ3FiOOuAE0wbFUohrXUa1cijcTp8iu-myXQnVi-1MABQks8gwM_yLd-uTYFXZW18j0SH2SANj9wqsVbNzuGy3bvvD3oeYAgUWIVRUA__UTil9BvZARnXM1Z0kWVQ5DowH7-SMf57SYek9S3Row9fzueI7MHqmLyhoINl24oe_U776xcP_OnjBsDM6WA1RxPjD2lWc7IoEt-fabewizUOxJo4SNiikzqihS3Fy1o6Uq9LsLRU7DJ_hFvazfOKLNkik2ww6Q3ZrrICWwghc-ZRiwDLO7FUoUdYqVGpsjzh1sQxSKWsTnygFxrMW-ukBKWFgFCaKBKx5S46Ic3VegWnhGoXKnBJlHSM82eB1B0OqfAthkJO356RFm7abFPlzpjV-3X-R_8F2UfbIWFExJekmW8LuPJuP9fXpbk_ASQosJ0
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LSwMxEMeDj4OeVKz4NgevqcnuZm2O0gf10SpYobeSx8SWYlvqFtFPb2a3qygInnZZ9kUGMjPJ7z9DyHlqYul9ypmTirNEWs5q4AUzVkdKKOe8QTVyp5u2n5Kbvuwvxeq5FgYAcvgMqnia7-W7qV3gUtkF7keiw1wl68HxS1HItUo-h6uL-n2jWQ8phkCJVRRXywd-tE7JPUdri3TLbxbAyLi6yEzVfvwqx_jvn9omlW-RHn34cj87ZAUmu-QNJR2sNS8A6XfamL6E1J8-zgDskDYnQzQyvpC2SiqLIvr-TK8WbjTFG7ErDiJbtFfGtDCnuFxLO_p1DI7mml0WJnFHr7KswCUrpNdq9upttuytwEZCqIyFvEWA47VE6SjkWKnVqXZccmeTBJTWzsgQ6kUWK9d6pUAbISBSNo5F4riP98jaZDqBfUKNjzR4Gcua9WE2UKbGIRXhiMGQN5cHpIKDNpgV1TMG5Xgd_nH9jGy0e527wd119_aIbKIdER8RyTFZy-YLOAlBQGZOc9N_AoaQs-Y
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+8th+International+Conference+on+Computers+and+Devices+for+Communication+%28CODEC%29&rft.atitle=Time-Frequency+Domain+Speech+Enhancement+Framework+using+Audio+Spectrogram+Transformer+with+Masked+Multi-head+Attention&rft.au=Samui%2C+Suman&rft.au=Garai%2C+Soumen&rft.date=2023-12-14&rft.pub=IEEE&rft.spage=1&rft.epage=2&rft_id=info:doi/10.1109%2FCODEC60112.2023.10465846&rft.externalDocID=10465846