A Lightweight CNN-Conformer Model for Automatic Speaker Verification

Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduce...

Full description

Saved in:
Bibliographic Details
Published inIEEE signal processing letters Vol. 31; pp. 1 - 5
Main Authors Wang, Hao, Lin, Xiaobing, Zhang, Jiashu
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduced prohibitive computing and memory overhead. Speaker verification is often applied in resource-constrained embedded environments like smartphones, where only low memory is available. In light of this, we proposed two approaches to compress the size of the Conformer-based system while maintaining its performance. First, we introduced a lightweight Convolutional Neural Network (CNN) front-end with channel-frequency attention to substitute shallow Conformer blocks. This substitution is aimed at extracting more informative speaker characteristics for subsequent processing. Secondly, we introduced a light Feed-forward Network (FFN) based on depth-wise separable convolution to decrease the model size of Conformer blocks. To better demonstrate the effectiveness of our model, we conducted the evaluation in three different test sets. By incorporating these two approaches, we achieved an Equal Error Rate (EER) of 0.61% on VoxCeleb-O, surpassing the previous state-of-the-art Transformer-based model MFA-Conformer. Moreover, our model has achieved a 60.6% reduction in parameters and a 36.8% reduction in FLOPs compared with MFA-Conformer.
AbstractList Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable performance in this domain, bypassing the need for intricate pre-training procedures. However, its special macaron-style feed-forward module introduced prohibitive computing and memory overhead. Speaker verification is often applied in resource-constrained embedded environments like smartphones, where only low memory is available. In light of this, we proposed two approaches to compress the size of the Conformer-based system while maintaining its performance. First, we introduced a lightweight Convolutional Neural Network (CNN) front-end with channel-frequency attention to substitute shallow Conformer blocks. This substitution is aimed at extracting more informative speaker characteristics for subsequent processing. Secondly, we introduced a light Feed-forward Network (FFN) based on depth-wise separable convolution to decrease the model size of Conformer blocks. To better demonstrate the effectiveness of our model, we conducted the evaluation in three different test sets. By incorporating these two approaches, we achieved an Equal Error Rate (EER) of 0.61% on VoxCeleb-O, surpassing the previous state-of-the-art Transformer-based model MFA-Conformer. Moreover, our model has achieved a 60.6% reduction in parameters and a 36.8% reduction in FLOPs compared with MFA-Conformer.
Author Wang, Hao
Lin, Xiaobing
Zhang, Jiashu
Author_xml – sequence: 1
  givenname: Hao
  orcidid: 0009-0006-9720-203X
  surname: Wang
  fullname: Wang, Hao
  organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
– sequence: 2
  givenname: Xiaobing
  orcidid: 0000-0002-9627-2451
  surname: Lin
  fullname: Lin, Xiaobing
  organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
– sequence: 3
  givenname: Jiashu
  orcidid: 0000-0003-2086-0991
  surname: Zhang
  fullname: Zhang, Jiashu
  organization: School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
BookMark eNp9kE1PwzAMhiM0JLbBnQOHSpw7nK82PU7lUyoDacA1SlsXMrZmpJ0Q_56M7YA4cIlj-31s6x2RQetaJOSUwoRSyC6K-eOEAeMTzgVLqTggQyqlihlP6CD8IYU4y0AdkVHXLQBAUSWH5HIaFfb1rf_E7Rvls1mcu7ZxfoU-unc1LqOQRNNN71amt1U0X6N5D70X9LaxVai59pgcNmbZ4ck-jsnz9dVTfhsXDzd3-bSIK5axPjaK1bWoORhMa4QmkUbSslEURSVlZYQBmfKyqVmqmCqRMkoTZYxJIC1BIR-T893ctXcfG-x6vXAb34aVmmWUiowJAUGV7FSVd13nsdGV7X_u7L2xS01Bbx3TwTG9dUzvHQsg_AHX3q6M__oPOdshFhF_yblMUpHwbzlMd6U
CODEN ISPLEM
CitedBy_id crossref_primary_10_1109_TASLP_2024_3492793
Cites_doi 10.1109/ICASSP43922.2022.9746020
10.21437/Interspeech.2017-950
10.21437/Interspeech.2018-1929
10.21437/interspeech.2020-3015
10.1109/CVPR.2018.00474
10.1109/MSP.2015.2462851
10.1109/CVPR.2016.90
10.21437/Interspeech.2022-88
10.21437/Interspeech.2020-1446
10.1109/CVPR.2018.00552
10.1109/JSTSP.2022.3188113
10.1609/aaai.v36i2.20099
10.21437/Interspeech.2018-993
10.1109/WASPAA52581.2021.9632794
10.1109/ICASSP43922.2022.9747830
10.21437/Interspeech.2016-1129
10.1109/TASL.2010.2064307
10.21437/Interspeech.2020-2650
10.1109/TPAMI.2019.2938758
10.1109/ICASSP43922.2022.9747021
10.1109/ICASSP49357.2023.10096333
10.1109/ICASSP.2019.8682712
10.21437/Interspeech.2017-803
10.1109/ICASSP39728.2021.9413815
10.21437/Interspeech.2022-563
10.21437/Odyssey.2022-31
10.1109/LSP.2021.3091932
10.1109/TASLP.2021.3134566
10.1109/ICASSP49357.2023.10095415
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/LSP.2023.3342714
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2361
EndPage 5
ExternalDocumentID 10_1109_LSP_2023_3342714
10356746
Genre orig-research
GrantInformation_xml – fundername: Sichuan Science and Technology Program
  grantid: 2022NSFSC0531
GroupedDBID -~X
.DC
0R~
29I
4.4
5GY
6IK
85S
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
3EH
5VS
AAYJJ
AAYXX
ABFSI
AETIX
AGSQL
AI.
AIBXA
ALLEH
CITATION
E.L
EJD
H~9
ICLAB
IFJZH
RIG
VH1
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c292t-a82dd4d30ae7de0f65a51bf81e4c55ca4a0573bfd27828be121168aaa607b08e3
IEDL.DBID RIE
ISSN 1070-9908
IngestDate Sun Jun 29 15:25:22 EDT 2025
Tue Jul 01 02:21:39 EDT 2025
Thu Apr 24 23:09:19 EDT 2025
Wed Aug 27 02:37:39 EDT 2025
IsPeerReviewed true
IsScholarly true
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c292t-a82dd4d30ae7de0f65a51bf81e4c55ca4a0573bfd27828be121168aaa607b08e3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-9627-2451
0009-0006-9720-203X
0000-0003-2086-0991
PQID 2911492440
PQPubID 75747
PageCount 5
ParticipantIDs proquest_journals_2911492440
ieee_primary_10356746
crossref_citationtrail_10_1109_LSP_2023_3342714
crossref_primary_10_1109_LSP_2023_3342714
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-01-01
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – month: 01
  year: 2024
  text: 2024-01-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE signal processing letters
PublicationTitleAbbrev LSP
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref15
ref14
ref31
ref30
ref11
ref10
ref2
ref1
ref17
ref16
ref18
Simonyan (ref19) 2014
ref24
ref23
ref26
ref25
ref20
ref22
ref21
Vaswani (ref9) 2017; 30
ref28
ref27
ref29
ref8
ref7
ref4
ref3
ref6
ref5
References_xml – ident: ref23
  doi: 10.1109/ICASSP43922.2022.9746020
– ident: ref26
  doi: 10.21437/Interspeech.2017-950
– ident: ref27
  doi: 10.21437/Interspeech.2018-1929
– ident: ref13
  doi: 10.21437/interspeech.2020-3015
– ident: ref18
  doi: 10.1109/CVPR.2018.00474
– ident: ref1
  doi: 10.1109/MSP.2015.2462851
– ident: ref20
  doi: 10.1109/CVPR.2016.90
– ident: ref16
  doi: 10.21437/Interspeech.2022-88
– ident: ref10
  doi: 10.21437/Interspeech.2020-1446
– ident: ref30
  doi: 10.1109/CVPR.2018.00552
– ident: ref12
  doi: 10.1109/JSTSP.2022.3188113
– ident: ref17
  doi: 10.1609/aaai.v36i2.20099
– ident: ref25
  doi: 10.21437/Interspeech.2018-993
– ident: ref21
  doi: 10.1109/WASPAA52581.2021.9632794
– ident: ref22
  doi: 10.1109/ICASSP43922.2022.9747830
– ident: ref28
  doi: 10.21437/Interspeech.2016-1129
– ident: ref3
  doi: 10.1109/TASL.2010.2064307
– ident: ref4
  doi: 10.21437/Interspeech.2020-2650
– ident: ref7
  doi: 10.1109/TPAMI.2019.2938758
– ident: ref5
  doi: 10.1109/ICASSP43922.2022.9747021
– ident: ref14
  doi: 10.1109/ICASSP49357.2023.10096333
– ident: ref24
  doi: 10.1109/ICASSP.2019.8682712
– ident: ref31
  doi: 10.21437/Interspeech.2017-803
– ident: ref29
  doi: 10.1109/ICASSP39728.2021.9413815
– volume-title: Proc. Int. Conf. Learn. Representations
  year: 2014
  ident: ref19
  article-title: Very deep convolutional networks for large-scale image recognition
– ident: ref15
  doi: 10.21437/Interspeech.2022-563
– ident: ref8
  doi: 10.21437/Odyssey.2022-31
– ident: ref2
  doi: 10.1109/LSP.2021.3091932
– ident: ref11
  doi: 10.1109/TASLP.2021.3134566
– ident: ref6
  doi: 10.1109/ICASSP49357.2023.10095415
– volume: 30
  start-page: 6000
  volume-title: Proc. Int. Conf. Adv. Neural Inf. Proces. Syst.
  year: 2017
  ident: ref9
  article-title: Attention is all you need
SSID ssj0008185
Score 2.4185963
Snippet Recently, Conformer has achieved tremendous success in speaker verification task. It demonstrates that Transformer-based model can achieve remarkable...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1
SubjectTerms Artificial neural networks
Computational modeling
Computer architecture
conformer
Convolution
Convolutional neural networks
Feature extraction
Lightweight
lightweight model
network architecture
Reduction
Speaker verification
Task analysis
Transformers
Verification
Title A Lightweight CNN-Conformer Model for Automatic Speaker Verification
URI https://ieeexplore.ieee.org/document/10356746
https://www.proquest.com/docview/2911492440
Volume 31
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG-Ekx78xIii2cGLh43StV13JKghBokJYrgtXft2gQDBLSb-9bbdRohG423L2q7pa1_f5-8hdEuyUCkjyPtZpKRPRYz9WEbEJxqAEZ5J4dwFz2M-nNKnGZtVyeouFwYAXPAZBPbR-fL1ShXWVGZOeMh4RHkDNYzmViZrbdmuvXnKAEPzuxiL2ieJ4-5o8hLYMuFBGFLiEnZ27iBXVOUHJ3bXy-MRGtcTK6NK5kGRp4H6_IbZ-O-ZH6PDStD0-uXOOEF7sDxFBzvwg2fovu-NrG7-4cyj3mA89m0CoJFiYePZImkLz7x4_SJfOWBXb7IGOTff3swAWWXsa6Hp48PrYOhXVRV8RWKS-1IQrakOsYRIA844k6yXZqIHVDGmJJUWIzHNNDHCg0jBYsBxIaXkOEqxgPAcNZerJVwgL-UsUpDSUBqtRFsHIeDYpepqYdgmb6Nuvc6JqiDHbeWLReJUDxwnhjKJpUxSUaaN7rY91iXcxh9tW3ahd9qVa9xGnZqWSXUg3xNimDo1uibFl790u0L7ZnRamlc6qJlvCrg2Akee3riN9gUwadAQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV3NTtwwEB5ROLQcWtqC2EKpD-2hhwTj2I5z6GEFRUtZVpWAilvq2JMLaBdBVqh9l75Kn42xk12tWrU3JG6JYiey54vnfwbgvagz50iQT-rc2USagieFzUUiPKISurYmugtORnpwLr9cqIsl-DXPhUHEGHyGabiMvnw_cdNgKqM_PFM6l7qLoTzGH3ekod1-Ojogcn4Q4vDz2f4g6ZoIJE4UokmsEd5Ln3GLuUdea2XVXlWbPZROKWelDSUBq9oL4pWmwlDyTBtrreZ5xQ1m9N4nsEKChhJtetj8oA-8rg1ppAUW3My8oLzYHZ5-TUNj8jTLpIgpQgtcL7Zx-evsjwzt8AX8nm1FG8dymU6bKnU__6gS-Wj3ag2ed6I067fYfwlLOH4FqwsFFl_DQZ8Ng_XhLhqA2f5olIQUR5LT8YaFNnBXjG5Yf9pMYuladnqN9pKefaMX1J05cx3OH2QZG7A8noxxE1ilVe6wkpklvcsHFyjyIiYje0OMQfdgd0bX0nVF1UNvj6syKle8KAkJZUBC2SGhBx_nM67bgiL_GbseCLswrqVpD7Zn2Cm7I-e2FMS2JGnTkr_5x7R38HRwdjIsh0ej4y14Rl-SrTFpG5abmym-JfGqqXYiyBl8f2ik3APJeS6v
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Lightweight+CNN-Conformer+Model+for+Automatic+Speaker+Verification&rft.jtitle=IEEE+signal+processing+letters&rft.au=Wang%2C+Hao&rft.au=Lin%2C+Xiaobing&rft.au=Zhang%2C+Jiashu&rft.date=2024-01-01&rft.pub=IEEE&rft.issn=1070-9908&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FLSP.2023.3342714&rft.externalDocID=10356746
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon