A Lightweight CNN-Conformer Model for Automatic Speaker Verification

Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduce...

Full description

Saved in:

Bibliographic Details
Published in	IEEE signal processing letters Vol. 31; pp. 1 - 5
Main Authors	Wang, Hao, Lin, Xiaobing, Zhang, Jiashu
Format	Journal Article
Language	English
Published	New York IEEE 01.01.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Computational modeling Computer architecture conformer Convolution Convolutional neural networks Feature extraction Lightweight lightweight model network architecture Reduction Speaker verification Task analysis Transformers Verification
Online Access	Get full text

Cover

Loading…

Abstract	Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduced prohibitive computing and memory overhead. Speaker verification is often applied in resource-constrained embedded environments like smartphones, where only low memory is available. In light of this, we proposed two approaches to compress the size of the Conformer-based system while maintaining its performance. First, we introduced a lightweight Convolutional Neural Network (CNN) front-end with channel-frequency attention to substitute shallow Conformer blocks. This substitution is aimed at extracting more informative speaker characteristics for subsequent processing. Secondly, we introduced a light Feed-forward Network (FFN) based on depth-wise separable convolution to decrease the model size of Conformer blocks. To better demonstrate the effectiveness of our model, we conducted the evaluation in three different test sets. By incorporating these two approaches, we achieved an Equal Error Rate (EER) of 0.61% on VoxCeleb-O, surpassing the previous state-of-the-art Transformer-based model MFA-Conformer. Moreover, our model has achieved a 60.6% reduction in parameters and a 36.8% reduction in FLOPs compared with MFA-Conformer.
AbstractList	Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduced prohibitive computing and memory overhead. Speaker verification is often applied in resource-constrained embedded environments like smartphones, where only low memory is available. In light of this, we proposed two approaches to compress the size of the Conformer-based system while maintaining its performance. First, we introduced a lightweight Convolutional Neural Network (CNN) front-end with channel-frequency attention to substitute shallow Conformer blocks. This substitution is aimed at extracting more informative speaker characteristics for subsequent processing. Secondly, we introduced a light Feed-forward Network (FFN) based on depth-wise separable convolution to decrease the model size of Conformer blocks. To better demonstrate the effectiveness of our model, we conducted the evaluation in three different test sets. By incorporating these two approaches, we achieved an Equal Error Rate (EER) of 0.61% on VoxCeleb-O, surpassing the previous state-of-the-art Transformer-based model MFA-Conformer. Moreover, our model has achieved a 60.6% reduction in parameters and a 36.8% reduction in FLOPs compared with MFA-Conformer.
Author	Wang, Hao Lin, Xiaobing Zhang, Jiashu
Author_xml	– sequence: 1 givenname: Hao orcidid: 0009-0006-9720-203X surname: Wang fullname: Wang, Hao organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China – sequence: 2 givenname: Xiaobing orcidid: 0000-0002-9627-2451 surname: Lin fullname: Lin, Xiaobing organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China – sequence: 3 givenname: Jiashu orcidid: 0000-0003-2086-0991 surname: Zhang fullname: Zhang, Jiashu organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
BookMark	eNp9kE1PwzAMhiM0JLbBnQOHSpw7nK82PU7lUyoDacA1SlsXMrZmpJ0Q_56M7YA4cIlj-31s6x2RQetaJOSUwoRSyC6K-eOEAeMTzgVLqTggQyqlihlP6CD8IYU4y0AdkVHXLQBAUSWH5HIaFfb1rf_E7Rvls1mcu7ZxfoU-unc1LqOQRNNN71amt1U0X6N5D70X9LaxVai59pgcNmbZ4ck-jsnz9dVTfhsXDzd3-bSIK5axPjaK1bWoORhMa4QmkUbSslEURSVlZYQBmfKyqVmqmCqRMkoTZYxJIC1BIR-T893ctXcfG-x6vXAb34aVmmWUiowJAUGV7FSVd13nsdGV7X_u7L2xS01Bbx3TwTG9dUzvHQsg_AHX3q6M__oPOdshFhF_yblMUpHwbzlMd6U
CODEN	ISPLEM
CitedBy_id	crossref_primary_10_1109_TASLP_2024_3492793
Cites_doi	10.1109/ICASSP43922.2022.9746020 10.21437/Interspeech.2017-950 10.21437/Interspeech.2018-1929 10.21437/interspeech.2020-3015 10.1109/CVPR.2018.00474 10.1109/MSP.2015.2462851 10.1109/CVPR.2016.90 10.21437/Interspeech.2022-88 10.21437/Interspeech.2020-1446 10.1109/CVPR.2018.00552 10.1109/JSTSP.2022.3188113 10.1609/aaai.v36i2.20099 10.21437/Interspeech.2018-993 10.1109/WASPAA52581.2021.9632794 10.1109/ICASSP43922.2022.9747830 10.21437/Interspeech.2016-1129 10.1109/TASL.2010.2064307 10.21437/Interspeech.2020-2650 10.1109/TPAMI.2019.2938758 10.1109/ICASSP43922.2022.9747021 10.1109/ICASSP49357.2023.10096333 10.1109/ICASSP.2019.8682712 10.21437/Interspeech.2017-803 10.1109/ICASSP39728.2021.9413815 10.21437/Interspeech.2022-563 10.21437/Odyssey.2022-31 10.1109/LSP.2021.3091932 10.1109/TASLP.2021.3134566 10.1109/ICASSP49357.2023.10095415
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/LSP.2023.3342714
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-2361
EndPage	5
ExternalDocumentID	10_1109_LSP_2023_3342714 10356746
Genre	orig-research
GrantInformation_xml	– fundername: Sichuan Science and Technology Program grantid: 2022NSFSC0531
GroupedDBID	-~X .DC 0R~ 29I 4.4 5GY 6IK 85S 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS TAE TN5 3EH 5VS AAYJJ AAYXX ABFSI AETIX AGSQL AI. AIBXA ALLEH CITATION E.L EJD H~9 ICLAB IFJZH RIG VH1 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c292t-a82dd4d30ae7de0f65a51bf81e4c55ca4a0573bfd27828be121168aaa607b08e3
IEDL.DBID	RIE
ISSN	1070-9908
IngestDate	Sun Jun 29 15:25:22 EDT 2025 Tue Jul 01 02:21:39 EDT 2025 Thu Apr 24 23:09:19 EDT 2025 Wed Aug 27 02:37:39 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c292t-a82dd4d30ae7de0f65a51bf81e4c55ca4a0573bfd27828be121168aaa607b08e3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-9627-2451 0009-0006-9720-203X 0000-0003-2086-0991
PQID	2911492440
PQPubID	75747
PageCount	5
ParticipantIDs	proquest_journals_2911492440 ieee_primary_10356746 crossref_citationtrail_10_1109_LSP_2023_3342714 crossref_primary_10_1109_LSP_2023_3342714
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-01-01
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– month: 01 year: 2024 text: 2024-01-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE signal processing letters
PublicationTitleAbbrev	LSP
PublicationYear	2024
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref12 ref15 ref14 ref31 ref30 ref11 ref10 ref2 ref1 ref17 ref16 ref18 Simonyan (ref19) 2014 ref24 ref23 ref26 ref25 ref20 ref22 ref21 Vaswani (ref9) 2017; 30 ref28 ref27 ref29 ref8 ref7 ref4 ref3 ref6 ref5
References_xml	– ident: ref23 doi: 10.1109/ICASSP43922.2022.9746020 – ident: ref26 doi: 10.21437/Interspeech.2017-950 – ident: ref27 doi: 10.21437/Interspeech.2018-1929 – ident: ref13 doi: 10.21437/interspeech.2020-3015 – ident: ref18 doi: 10.1109/CVPR.2018.00474 – ident: ref1 doi: 10.1109/MSP.2015.2462851 – ident: ref20 doi: 10.1109/CVPR.2016.90 – ident: ref16 doi: 10.21437/Interspeech.2022-88 – ident: ref10 doi: 10.21437/Interspeech.2020-1446 – ident: ref30 doi: 10.1109/CVPR.2018.00552 – ident: ref12 doi: 10.1109/JSTSP.2022.3188113 – ident: ref17 doi: 10.1609/aaai.v36i2.20099 – ident: ref25 doi: 10.21437/Interspeech.2018-993 – ident: ref21 doi: 10.1109/WASPAA52581.2021.9632794 – ident: ref22 doi: 10.1109/ICASSP43922.2022.9747830 – ident: ref28 doi: 10.21437/Interspeech.2016-1129 – ident: ref3 doi: 10.1109/TASL.2010.2064307 – ident: ref4 doi: 10.21437/Interspeech.2020-2650 – ident: ref7 doi: 10.1109/TPAMI.2019.2938758 – ident: ref5 doi: 10.1109/ICASSP43922.2022.9747021 – ident: ref14 doi: 10.1109/ICASSP49357.2023.10096333 – ident: ref24 doi: 10.1109/ICASSP.2019.8682712 – ident: ref31 doi: 10.21437/Interspeech.2017-803 – ident: ref29 doi: 10.1109/ICASSP39728.2021.9413815 – volume-title: Proc. Int. Conf. Learn. Representations year: 2014 ident: ref19 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref15 doi: 10.21437/Interspeech.2022-563 – ident: ref8 doi: 10.21437/Odyssey.2022-31 – ident: ref2 doi: 10.1109/LSP.2021.3091932 – ident: ref11 doi: 10.1109/TASLP.2021.3134566 – ident: ref6 doi: 10.1109/ICASSP49357.2023.10095415 – volume: 30 start-page: 6000 volume-title: Proc. Int. Conf. Adv. Neural Inf. Proces. Syst. year: 2017 ident: ref9 article-title: Attention is all you need
SSID	ssj0008185
Score	2.4185963
Snippet	Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	1
SubjectTerms	Artificial neural networks Computational modeling Computer architecture conformer Convolution Convolutional neural networks Feature extraction Lightweight lightweight model network architecture Reduction Speaker verification Task analysis Transformers Verification
Title	A Lightweight CNN-Conformer Model for Automatic Speaker Verification
URI	https://ieeexplore.ieee.org/document/10356746 https://www.proquest.com/docview/2911492440
Volume	31
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-Ekx78xIii2cGLh43StV13JKghBokJYrgtXft2gQDBLSb-9bbdRohG423L2q7pa1_f5-8hdEuyUCkjyPtZpKRPRYz9WEbEJxqAEZ5J4dwFz2M-nNKnGZtVyeouFwYAXPAZBPbR-fL1ShXWVGZOeMh4RHkDNYzmViZrbdmuvXnKAEPzuxiL2ieJ4-5o8hLYMuFBGFLiEnZ27iBXVOUHJ3bXy-MRGtcTK6NK5kGRp4H6_IbZ-O-ZH6PDStD0-uXOOEF7sDxFBzvwg2fovu-NrG7-4cyj3mA89m0CoJFiYePZImkLz7x4_SJfOWBXb7IGOTff3swAWWXsa6Hp48PrYOhXVRV8RWKS-1IQrakOsYRIA844k6yXZqIHVDGmJJUWIzHNNDHCg0jBYsBxIaXkOEqxgPAcNZerJVwgL-UsUpDSUBqtRFsHIeDYpepqYdgmb6Nuvc6JqiDHbeWLReJUDxwnhjKJpUxSUaaN7rY91iXcxh9tW3ahd9qVa9xGnZqWSXUg3xNimDo1uibFl790u0L7ZnRamlc6qJlvCrg2Akee3riN9gUwadAQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NTtwwEB5ROLQcWtqC2EKpD-2hhwTj2I5z6GEFRUtZVpWAilvq2JMLaBdBVqh9l75Kn42xk12tWrU3JG6JYiey54vnfwbgvagz50iQT-rc2USagieFzUUiPKISurYmugtORnpwLr9cqIsl-DXPhUHEGHyGabiMvnw_cdNgKqM_PFM6l7qLoTzGH3ekod1-Ojogcn4Q4vDz2f4g6ZoIJE4UokmsEd5Ln3GLuUdea2XVXlWbPZROKWelDSUBq9oL4pWmwlDyTBtrreZ5xQ1m9N4nsEKChhJtetj8oA-8rg1ppAUW3My8oLzYHZ5-TUNj8jTLpIgpQgtcL7Zx-evsjwzt8AX8nm1FG8dymU6bKnU__6gS-Wj3ag2ed6I067fYfwlLOH4FqwsFFl_DQZ8Ng_XhLhqA2f5olIQUR5LT8YaFNnBXjG5Yf9pMYuladnqN9pKefaMX1J05cx3OH2QZG7A8noxxE1ilVe6wkpklvcsHFyjyIiYje0OMQfdgd0bX0nVF1UNvj6syKle8KAkJZUBC2SGhBx_nM67bgiL_GbseCLswrqVpD7Zn2Cm7I-e2FMS2JGnTkr_5x7R38HRwdjIsh0ej4y14Rl-SrTFpG5abmym-JfGqqXYiyBl8f2ik3APJeS6v
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Lightweight+CNN-Conformer+Model+for+Automatic+Speaker+Verification&rft.jtitle=IEEE+signal+processing+letters&rft.au=Wang%2C+Hao&rft.au=Lin%2C+Xiaobing&rft.au=Zhang%2C+Jiashu&rft.date=2024-01-01&rft.pub=IEEE&rft.issn=1070-9908&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FLSP.2023.3342714&rft.externalDocID=10356746
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon