A Lightweight CNN-Conformer Model for Automatic Speaker Verification
Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduce...
Saved in:
Published in | IEEE signal processing letters Vol. 31; pp. 1 - 5 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduced prohibitive computing and memory overhead. Speaker verification is often applied in resource-constrained embedded environments like smartphones, where only low memory is available. In light of this, we proposed two approaches to compress the size of the Conformer-based system while maintaining its performance. First, we introduced a lightweight Convolutional Neural Network (CNN) front-end with channel-frequency attention to substitute shallow Conformer blocks. This substitution is aimed at extracting more informative speaker characteristics for subsequent processing. Secondly, we introduced a light Feed-forward Network (FFN) based on depth-wise separable convolution to decrease the model size of Conformer blocks. To better demonstrate the effectiveness of our model, we conducted the evaluation in three different test sets. By incorporating these two approaches, we achieved an Equal Error Rate (EER) of 0.61% on VoxCeleb-O, surpassing the previous state-of-the-art Transformer-based model MFA-Conformer. Moreover, our model has achieved a 60.6% reduction in parameters and a 36.8% reduction in FLOPs compared with MFA-Conformer. |
---|---|
AbstractList | Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduced prohibitive computing and memory overhead. Speaker verification is often applied in resource-constrained embedded environments like smartphones, where only low memory is available. In light of this, we proposed two approaches to compress the size of the Conformer-based system while maintaining its performance. First, we introduced a lightweight Convolutional Neural Network (CNN) front-end with channel-frequency attention to substitute shallow Conformer blocks. This substitution is aimed at extracting more informative speaker characteristics for subsequent processing. Secondly, we introduced a light Feed-forward Network (FFN) based on depth-wise separable convolution to decrease the model size of Conformer blocks. To better demonstrate the effectiveness of our model, we conducted the evaluation in three different test sets. By incorporating these two approaches, we achieved an Equal Error Rate (EER) of 0.61% on VoxCeleb-O, surpassing the previous state-of-the-art Transformer-based model MFA-Conformer. Moreover, our model has achieved a 60.6% reduction in parameters and a 36.8% reduction in FLOPs compared with MFA-Conformer. |
Author | Wang, Hao Lin, Xiaobing Zhang, Jiashu |
Author_xml | – sequence: 1 givenname: Hao orcidid: 0009-0006-9720-203X surname: Wang fullname: Wang, Hao organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China – sequence: 2 givenname: Xiaobing orcidid: 0000-0002-9627-2451 surname: Lin fullname: Lin, Xiaobing organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China – sequence: 3 givenname: Jiashu orcidid: 0000-0003-2086-0991 surname: Zhang fullname: Zhang, Jiashu organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China |
BookMark | eNp9kE1PwzAMhiM0JLbBnQOHSpw7nK82PU7lUyoDacA1SlsXMrZmpJ0Q_56M7YA4cIlj-31s6x2RQetaJOSUwoRSyC6K-eOEAeMTzgVLqTggQyqlihlP6CD8IYU4y0AdkVHXLQBAUSWH5HIaFfb1rf_E7Rvls1mcu7ZxfoU-unc1LqOQRNNN71amt1U0X6N5D70X9LaxVai59pgcNmbZ4ck-jsnz9dVTfhsXDzd3-bSIK5axPjaK1bWoORhMa4QmkUbSslEURSVlZYQBmfKyqVmqmCqRMkoTZYxJIC1BIR-T893ctXcfG-x6vXAb34aVmmWUiowJAUGV7FSVd13nsdGV7X_u7L2xS01Bbx3TwTG9dUzvHQsg_AHX3q6M__oPOdshFhF_yblMUpHwbzlMd6U |
CODEN | ISPLEM |
CitedBy_id | crossref_primary_10_1109_TASLP_2024_3492793 |
Cites_doi | 10.1109/ICASSP43922.2022.9746020 10.21437/Interspeech.2017-950 10.21437/Interspeech.2018-1929 10.21437/interspeech.2020-3015 10.1109/CVPR.2018.00474 10.1109/MSP.2015.2462851 10.1109/CVPR.2016.90 10.21437/Interspeech.2022-88 10.21437/Interspeech.2020-1446 10.1109/CVPR.2018.00552 10.1109/JSTSP.2022.3188113 10.1609/aaai.v36i2.20099 10.21437/Interspeech.2018-993 10.1109/WASPAA52581.2021.9632794 10.1109/ICASSP43922.2022.9747830 10.21437/Interspeech.2016-1129 10.1109/TASL.2010.2064307 10.21437/Interspeech.2020-2650 10.1109/TPAMI.2019.2938758 10.1109/ICASSP43922.2022.9747021 10.1109/ICASSP49357.2023.10096333 10.1109/ICASSP.2019.8682712 10.21437/Interspeech.2017-803 10.1109/ICASSP39728.2021.9413815 10.21437/Interspeech.2022-563 10.21437/Odyssey.2022-31 10.1109/LSP.2021.3091932 10.1109/TASLP.2021.3134566 10.1109/ICASSP49357.2023.10095415 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/LSP.2023.3342714 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-2361 |
EndPage | 5 |
ExternalDocumentID | 10_1109_LSP_2023_3342714 10356746 |
Genre | orig-research |
GrantInformation_xml | – fundername: Sichuan Science and Technology Program grantid: 2022NSFSC0531 |
GroupedDBID | -~X .DC 0R~ 29I 4.4 5GY 6IK 85S 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TAE TN5 3EH 5VS AAYJJ AAYXX ABFSI AETIX AGSQL AI. AIBXA ALLEH CITATION E.L EJD H~9 ICLAB IFJZH RIG VH1 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c292t-a82dd4d30ae7de0f65a51bf81e4c55ca4a0573bfd27828be121168aaa607b08e3 |
IEDL.DBID | RIE |
ISSN | 1070-9908 |
IngestDate | Sun Jun 29 15:25:22 EDT 2025 Tue Jul 01 02:21:39 EDT 2025 Thu Apr 24 23:09:19 EDT 2025 Wed Aug 27 02:37:39 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c292t-a82dd4d30ae7de0f65a51bf81e4c55ca4a0573bfd27828be121168aaa607b08e3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-9627-2451 0009-0006-9720-203X 0000-0003-2086-0991 |
PQID | 2911492440 |
PQPubID | 75747 |
PageCount | 5 |
ParticipantIDs | proquest_journals_2911492440 ieee_primary_10356746 crossref_citationtrail_10_1109_LSP_2023_3342714 crossref_primary_10_1109_LSP_2023_3342714 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-01-01 |
PublicationDateYYYYMMDD | 2024-01-01 |
PublicationDate_xml | – month: 01 year: 2024 text: 2024-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE signal processing letters |
PublicationTitleAbbrev | LSP |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref15 ref14 ref31 ref30 ref11 ref10 ref2 ref1 ref17 ref16 ref18 Simonyan (ref19) 2014 ref24 ref23 ref26 ref25 ref20 ref22 ref21 Vaswani (ref9) 2017; 30 ref28 ref27 ref29 ref8 ref7 ref4 ref3 ref6 ref5 |
References_xml | – ident: ref23 doi: 10.1109/ICASSP43922.2022.9746020 – ident: ref26 doi: 10.21437/Interspeech.2017-950 – ident: ref27 doi: 10.21437/Interspeech.2018-1929 – ident: ref13 doi: 10.21437/interspeech.2020-3015 – ident: ref18 doi: 10.1109/CVPR.2018.00474 – ident: ref1 doi: 10.1109/MSP.2015.2462851 – ident: ref20 doi: 10.1109/CVPR.2016.90 – ident: ref16 doi: 10.21437/Interspeech.2022-88 – ident: ref10 doi: 10.21437/Interspeech.2020-1446 – ident: ref30 doi: 10.1109/CVPR.2018.00552 – ident: ref12 doi: 10.1109/JSTSP.2022.3188113 – ident: ref17 doi: 10.1609/aaai.v36i2.20099 – ident: ref25 doi: 10.21437/Interspeech.2018-993 – ident: ref21 doi: 10.1109/WASPAA52581.2021.9632794 – ident: ref22 doi: 10.1109/ICASSP43922.2022.9747830 – ident: ref28 doi: 10.21437/Interspeech.2016-1129 – ident: ref3 doi: 10.1109/TASL.2010.2064307 – ident: ref4 doi: 10.21437/Interspeech.2020-2650 – ident: ref7 doi: 10.1109/TPAMI.2019.2938758 – ident: ref5 doi: 10.1109/ICASSP43922.2022.9747021 – ident: ref14 doi: 10.1109/ICASSP49357.2023.10096333 – ident: ref24 doi: 10.1109/ICASSP.2019.8682712 – ident: ref31 doi: 10.21437/Interspeech.2017-803 – ident: ref29 doi: 10.1109/ICASSP39728.2021.9413815 – volume-title: Proc. Int. Conf. Learn. Representations year: 2014 ident: ref19 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref15 doi: 10.21437/Interspeech.2022-563 – ident: ref8 doi: 10.21437/Odyssey.2022-31 – ident: ref2 doi: 10.1109/LSP.2021.3091932 – ident: ref11 doi: 10.1109/TASLP.2021.3134566 – ident: ref6 doi: 10.1109/ICASSP49357.2023.10095415 – volume: 30 start-page: 6000 volume-title: Proc. Int. Conf. Adv. Neural Inf. Proces. Syst. year: 2017 ident: ref9 article-title: Attention is all you need |
SSID | ssj0008185 |
Score | 2.4185963 |
Snippet | Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1 |
SubjectTerms | Artificial neural networks Computational modeling Computer architecture conformer Convolution Convolutional neural networks Feature extraction Lightweight lightweight model network architecture Reduction Speaker verification Task analysis Transformers Verification |
Title | A Lightweight CNN-Conformer Model for Automatic Speaker Verification |
URI | https://ieeexplore.ieee.org/document/10356746 https://www.proquest.com/docview/2911492440 |
Volume | 31 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-Ekx78xIii2cGLh43StV13JKghBokJYrgtXft2gQDBLSb-9bbdRohG423L2q7pa1_f5-8hdEuyUCkjyPtZpKRPRYz9WEbEJxqAEZ5J4dwFz2M-nNKnGZtVyeouFwYAXPAZBPbR-fL1ShXWVGZOeMh4RHkDNYzmViZrbdmuvXnKAEPzuxiL2ieJ4-5o8hLYMuFBGFLiEnZ27iBXVOUHJ3bXy-MRGtcTK6NK5kGRp4H6_IbZ-O-ZH6PDStD0-uXOOEF7sDxFBzvwg2fovu-NrG7-4cyj3mA89m0CoJFiYePZImkLz7x4_SJfOWBXb7IGOTff3swAWWXsa6Hp48PrYOhXVRV8RWKS-1IQrakOsYRIA844k6yXZqIHVDGmJJUWIzHNNDHCg0jBYsBxIaXkOEqxgPAcNZerJVwgL-UsUpDSUBqtRFsHIeDYpepqYdgmb6Nuvc6JqiDHbeWLReJUDxwnhjKJpUxSUaaN7rY91iXcxh9tW3ahd9qVa9xGnZqWSXUg3xNimDo1uibFl790u0L7ZnRamlc6qJlvCrg2Akee3riN9gUwadAQ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NTtwwEB5ROLQcWtqC2EKpD-2hhwTj2I5z6GEFRUtZVpWAilvq2JMLaBdBVqh9l75Kn42xk12tWrU3JG6JYiey54vnfwbgvagz50iQT-rc2USagieFzUUiPKISurYmugtORnpwLr9cqIsl-DXPhUHEGHyGabiMvnw_cdNgKqM_PFM6l7qLoTzGH3ekod1-Ojogcn4Q4vDz2f4g6ZoIJE4UokmsEd5Ln3GLuUdea2XVXlWbPZROKWelDSUBq9oL4pWmwlDyTBtrreZ5xQ1m9N4nsEKChhJtetj8oA-8rg1ppAUW3My8oLzYHZ5-TUNj8jTLpIgpQgtcL7Zx-evsjwzt8AX8nm1FG8dymU6bKnU__6gS-Wj3ag2ed6I067fYfwlLOH4FqwsFFl_DQZ8Ng_XhLhqA2f5olIQUR5LT8YaFNnBXjG5Yf9pMYuladnqN9pKefaMX1J05cx3OH2QZG7A8noxxE1ilVe6wkpklvcsHFyjyIiYje0OMQfdgd0bX0nVF1UNvj6syKle8KAkJZUBC2SGhBx_nM67bgiL_GbseCLswrqVpD7Zn2Cm7I-e2FMS2JGnTkr_5x7R38HRwdjIsh0ej4y14Rl-SrTFpG5abmym-JfGqqXYiyBl8f2ik3APJeS6v |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Lightweight+CNN-Conformer+Model+for+Automatic+Speaker+Verification&rft.jtitle=IEEE+signal+processing+letters&rft.au=Wang%2C+Hao&rft.au=Lin%2C+Xiaobing&rft.au=Zhang%2C+Jiashu&rft.date=2024-01-01&rft.pub=IEEE&rft.issn=1070-9908&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FLSP.2023.3342714&rft.externalDocID=10356746 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon |