Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention

Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) lay...

Full description

Saved in:

Bibliographic Details
Published in	2023 8th International Conference on Computers and Devices for Communication (CODEC) pp. 1 - 2
Main Authors	Samui, Suman, Garai, Soumen
Format	Conference Proceeding
Language	English
Published	IEEE 14.12.2023
Subjects	Audio Spectrogram Transformer Mean square error methods Multi-head Attention Performance evaluation Speech coding Speech enhancement Speech processing Speech recognition Time-frequency analysis Transformer Transformers
Online Access	Get full text
DOI	10.1109/CODEC60112.2023.10465846

Cover

Loading…

Abstract	Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications.
AbstractList	Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications.
Author	Samui, Suman Garai, Soumen
Author_xml	– sequence: 1 givenname: Suman surname: Samui fullname: Samui, Suman email: samuisuman@gmail.com organization: National Institute of Technology,Department of Electronics and Communication Engineering,Durgapur,West Bengal,India,713209 – sequence: 2 givenname: Soumen surname: Garai fullname: Garai, Soumen email: soumengoroi@gmail.com organization: National Institute of Technology,Department of Electronics and Communication Engineering,Durgapur,West Bengal,India,713209
BookMark	eNo1kM1OAjEURmuiC0XewEVfYLB3-gNdkgHUBMLC2ZPS3mEamBY7nRDeXoy6-hbn5Cy-J3IfYkBCKLAJANOv1XaxrBQDKCclK_kEmFByJtQdGeupnnHJOExhqh7JpfYdFquEXwMGe6WL2Bkf6OcZ0bZ0GVoTLHYYMl0l0-ElpiMdeh8OdD44H39Em1M83CCtkwl9E1OHiV58bunG9Ed0dDOcsi9aNI7Oc761fAzP5KExpx7Hfzsi9WpZV-_Fevv2Uc3XhQfQuVBMATo2E9qUIJiyRhnHJHNWCNTGuL2USpZWSa4brdHsAbDUlnMQjjV8RF5-sx4Rd-fkO5Ouu_87-Df8oFxj
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/CODEC60112.2023.10465846
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798350317176
EndPage	2
ExternalDocumentID	10465846
Genre	orig-research
GroupedDBID	6IE 6IL CBEJK RIE RIL
ID	FETCH-LOGICAL-i119t-6061ed0849a21406ca6ad050dc44e9aadb55652c6539f99eab11e29c3314d0f3
IEDL.DBID	RIE
IngestDate	Wed May 01 11:50:10 EDT 2024
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i119t-6061ed0849a21406ca6ad050dc44e9aadb55652c6539f99eab11e29c3314d0f3
PageCount	2
ParticipantIDs	ieee_primary_10465846
PublicationCentury	2000
PublicationDate	2023-Dec.-14
PublicationDateYYYYMMDD	2023-12-14
PublicationDate_xml	– month: 12 year: 2023 text: 2023-Dec.-14 day: 14
PublicationDecade	2020
PublicationTitle	2023 8th International Conference on Computers and Devices for Communication (CODEC)
PublicationTitleAbbrev	CODEC
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
Score	1.8565527
Snippet	Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	Audio Spectrogram Transformer Mean square error methods Multi-head Attention Performance evaluation Speech coding Speech enhancement Speech processing Speech recognition Time-frequency analysis Transformer Transformers
Title	Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention
URI	https://ieeexplore.ieee.org/document/10465846
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA62J08qVnyTg9dsk33ZHEvbpQitghV6K3lMbCm2pe4i-uvN7HYVBcFTlhA2IQOZ-ZLvmyHkJtVR4lzKmU0kZ3FiOOuAE0wbFUohrXUa1cijcTp8iu-myXQnVi-1MABQks8gwM_yLd-uTYFXZW18j0SH2SANj9wqsVbNzuGy3bvvD3oeYAgUWIVRUA__UTil9BvZARnXM1Z0kWVQ5DowH7-SMf57SYek9S3Row9fzueI7MHqmLyhoINl24oe_U776xcP_OnjBsDM6WA1RxPjD2lWc7IoEt-fabewizUOxJo4SNiikzqihS3Fy1o6Uq9LsLRU7DJ_hFvazfOKLNkik2ww6Q3ZrrICWwghc-ZRiwDLO7FUoUdYqVGpsjzh1sQxSKWsTnygFxrMW-ukBKWFgFCaKBKx5S46Ic3VegWnhGoXKnBJlHSM82eB1B0OqfAthkJO356RFm7abFPlzpjV-3X-R_8F2UfbIWFExJekmW8LuPJuP9fXpbk_ASQosJ0
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LSwMxEMeDj4OeVKz4NgevqcnuZm2O0gf10SpYobeSx8SWYlvqFtFPb2a3qygInnZZ9kUGMjPJ7z9DyHlqYul9ypmTirNEWs5q4AUzVkdKKOe8QTVyp5u2n5Kbvuwvxeq5FgYAcvgMqnia7-W7qV3gUtkF7keiw1wl68HxS1HItUo-h6uL-n2jWQ8phkCJVRRXywd-tE7JPUdri3TLbxbAyLi6yEzVfvwqx_jvn9omlW-RHn34cj87ZAUmu-QNJR2sNS8A6XfamL6E1J8-zgDskDYnQzQyvpC2SiqLIvr-TK8WbjTFG7ErDiJbtFfGtDCnuFxLO_p1DI7mml0WJnFHr7KswCUrpNdq9upttuytwEZCqIyFvEWA47VE6SjkWKnVqXZccmeTBJTWzsgQ6kUWK9d6pUAbISBSNo5F4riP98jaZDqBfUKNjzR4Gcua9WE2UKbGIRXhiMGQN5cHpIKDNpgV1TMG5Xgd_nH9jGy0e527wd119_aIbKIdER8RyTFZy-YLOAlBQGZOc9N_AoaQs-Y
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+8th+International+Conference+on+Computers+and+Devices+for+Communication+%28CODEC%29&rft.atitle=Time-Frequency+Domain+Speech+Enhancement+Framework+using+Audio+Spectrogram+Transformer+with+Masked+Multi-head+Attention&rft.au=Samui%2C+Suman&rft.au=Garai%2C+Soumen&rft.date=2023-12-14&rft.pub=IEEE&rft.spage=1&rft.epage=2&rft_id=info:doi/10.1109%2FCODEC60112.2023.10465846&rft.externalDocID=10465846