Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention
Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) lay...
Saved in:
Published in | 2023 8th International Conference on Computers and Devices for Communication (CODEC) pp. 1 - 2 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
14.12.2023
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/CODEC60112.2023.10465846 |
Cover
Loading…
Abstract | Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications. |
---|---|
AbstractList | Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel time-frequency domain speech enhancement framework utilizing an Audio Spectrogram Transformer (AST) with a Masked Multi-head Attention (MMHAt) layer. The AST is a convolution-free deep learning architecture, and it is directly applied to an audio spectrogram. Its multi-head attention layer can capture the long-range global context in the time-frequency domain. Moreover, the masking has been applied to ensure the causality of the Multi-Head Attention mechanism, which is essential for real-time applications. |
Author | Samui, Suman Garai, Soumen |
Author_xml | – sequence: 1 givenname: Suman surname: Samui fullname: Samui, Suman email: samuisuman@gmail.com organization: National Institute of Technology,Department of Electronics and Communication Engineering,Durgapur,West Bengal,India,713209 – sequence: 2 givenname: Soumen surname: Garai fullname: Garai, Soumen email: soumengoroi@gmail.com organization: National Institute of Technology,Department of Electronics and Communication Engineering,Durgapur,West Bengal,India,713209 |
BookMark | eNo1kM1OAjEURmuiC0XewEVfYLB3-gNdkgHUBMLC2ZPS3mEamBY7nRDeXoy6-hbn5Cy-J3IfYkBCKLAJANOv1XaxrBQDKCclK_kEmFByJtQdGeupnnHJOExhqh7JpfYdFquEXwMGe6WL2Bkf6OcZ0bZ0GVoTLHYYMl0l0-ElpiMdeh8OdD44H39Em1M83CCtkwl9E1OHiV58bunG9Ed0dDOcsi9aNI7Oc761fAzP5KExpx7Hfzsi9WpZV-_Fevv2Uc3XhQfQuVBMATo2E9qUIJiyRhnHJHNWCNTGuL2USpZWSa4brdHsAbDUlnMQjjV8RF5-sx4Rd-fkO5Ouu_87-Df8oFxj |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/CODEC60112.2023.10465846 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9798350317176 |
EndPage | 2 |
ExternalDocumentID | 10465846 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i119t-6061ed0849a21406ca6ad050dc44e9aadb55652c6539f99eab11e29c3314d0f3 |
IEDL.DBID | RIE |
IngestDate | Wed May 01 11:50:10 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i119t-6061ed0849a21406ca6ad050dc44e9aadb55652c6539f99eab11e29c3314d0f3 |
PageCount | 2 |
ParticipantIDs | ieee_primary_10465846 |
PublicationCentury | 2000 |
PublicationDate | 2023-Dec.-14 |
PublicationDateYYYYMMDD | 2023-12-14 |
PublicationDate_xml | – month: 12 year: 2023 text: 2023-Dec.-14 day: 14 |
PublicationDecade | 2020 |
PublicationTitle | 2023 8th International Conference on Computers and Devices for Communication (CODEC) |
PublicationTitleAbbrev | CODEC |
PublicationYear | 2023 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.8565527 |
Snippet | Speech enhancement is crucial in various applications, such as speech recognition, hearing aids, and telecommunications-VoIP. This paper presents a novel... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | Audio Spectrogram Transformer Mean square error methods Multi-head Attention Performance evaluation Speech coding Speech enhancement Speech processing Speech recognition Time-frequency analysis Transformer Transformers |
Title | Time-Frequency Domain Speech Enhancement Framework using Audio Spectrogram Transformer with Masked Multi-head Attention |
URI | https://ieeexplore.ieee.org/document/10465846 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA62J08qVnyTg9dsk33ZHEvbpQitghV6K3lMbCm2pe4i-uvN7HYVBcFTlhA2IQOZ-ZLvmyHkJtVR4lzKmU0kZ3FiOOuAE0wbFUohrXUa1cijcTp8iu-myXQnVi-1MABQks8gwM_yLd-uTYFXZW18j0SH2SANj9wqsVbNzuGy3bvvD3oeYAgUWIVRUA__UTil9BvZARnXM1Z0kWVQ5DowH7-SMf57SYek9S3Row9fzueI7MHqmLyhoINl24oe_U776xcP_OnjBsDM6WA1RxPjD2lWc7IoEt-fabewizUOxJo4SNiikzqihS3Fy1o6Uq9LsLRU7DJ_hFvazfOKLNkik2ww6Q3ZrrICWwghc-ZRiwDLO7FUoUdYqVGpsjzh1sQxSKWsTnygFxrMW-ukBKWFgFCaKBKx5S46Ic3VegWnhGoXKnBJlHSM82eB1B0OqfAthkJO356RFm7abFPlzpjV-3X-R_8F2UfbIWFExJekmW8LuPJuP9fXpbk_ASQosJ0 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjZ1LSwMxEMeDj4OeVKz4NgevqcnuZm2O0gf10SpYobeSx8SWYlvqFtFPb2a3qygInnZZ9kUGMjPJ7z9DyHlqYul9ypmTirNEWs5q4AUzVkdKKOe8QTVyp5u2n5Kbvuwvxeq5FgYAcvgMqnia7-W7qV3gUtkF7keiw1wl68HxS1HItUo-h6uL-n2jWQ8phkCJVRRXywd-tE7JPUdri3TLbxbAyLi6yEzVfvwqx_jvn9omlW-RHn34cj87ZAUmu-QNJR2sNS8A6XfamL6E1J8-zgDskDYnQzQyvpC2SiqLIvr-TK8WbjTFG7ErDiJbtFfGtDCnuFxLO_p1DI7mml0WJnFHr7KswCUrpNdq9upttuytwEZCqIyFvEWA47VE6SjkWKnVqXZccmeTBJTWzsgQ6kUWK9d6pUAbISBSNo5F4riP98jaZDqBfUKNjzR4Gcua9WE2UKbGIRXhiMGQN5cHpIKDNpgV1TMG5Xgd_nH9jGy0e527wd119_aIbKIdER8RyTFZy-YLOAlBQGZOc9N_AoaQs-Y |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2023+8th+International+Conference+on+Computers+and+Devices+for+Communication+%28CODEC%29&rft.atitle=Time-Frequency+Domain+Speech+Enhancement+Framework+using+Audio+Spectrogram+Transformer+with+Masked+Multi-head+Attention&rft.au=Samui%2C+Suman&rft.au=Garai%2C+Soumen&rft.date=2023-12-14&rft.pub=IEEE&rft.spage=1&rft.epage=2&rft_id=info:doi/10.1109%2FCODEC60112.2023.10465846&rft.externalDocID=10465846 |