Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling

Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) r...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on audio, speech, and language processing Vol. 20; no. 1; pp. 108 - 121
Main Authors May, T., van de Par, S., Kohlrausch, A.
Format Journal Article
LanguageEnglish
Published Piscataway, NJ IEEE 01.01.2012
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) recognition with the adaptation of speaker models using a universal background model (UBM). For MD recognition, the identification of reliable and unreliable feature components is required. For this purpose, the signal-to-noise ratio (SNR) based mask estimation performance of various state-of-the art noise estimation techniques and noise reduction schemes is compared. Speaker recognition experiments show that the usage of a UBM in combination with missing data recognition yields substantial improvements in recognition performance, especially in the presence of highly non-stationary background noise at low SNRs.
AbstractList Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) recognition with the adaptation of speaker models using a universal background model (UBM). For MD recognition, the identification of reliable and unreliable feature components is required. For this purpose, the signal-to-noise ratio (SNR) based mask estimation performance of various state-of-the art noise estimation techniques and noise reduction schemes is compared. Speaker recognition experiments show that the usage of a UBM in combination with missing data recognition yields substantial improvements in recognition performance, especially in the presence of highly non-stationary background noise at low SNRs.
Author May, T.
Kohlrausch, A.
van de Par, S.
Author_xml – sequence: 1
  givenname: T.
  surname: May
  fullname: May, T.
  email: tobias.may@uni-oldenburg.de
  organization: Inst. of Phys., Univ. of Oldenburg, Oldenburg, Germany
– sequence: 2
  givenname: S.
  surname: van de Par
  fullname: van de Par, S.
  organization: Inst. of Phys., Univ. of Oldenburg, Oldenburg, Germany
– sequence: 3
  givenname: A.
  surname: Kohlrausch
  fullname: Kohlrausch, A.
  organization: Philips Res., Eindhoven, Netherlands
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=25473438$$DView record in Pascal Francis
BookMark eNp9kMtOwzAQRS1UJNrCByA22bBM8TNOlqU8pRakPtaR40yKaWoXO0Xi70nUqgsWrO5odM-MdAaoZ50FhK4JHhGCs7vleDEdUUzIiBKRMpydoT4RIo1lRnnvNJPkAg1C-MSYs4STPirfnAkQz12xD0202IHagI_moN3amsY4G03ctjDW2HU0MyF0-aAaFS1Bf1jztYcQKVtGK2u-wQdVR_dKb9be7dvlzJVQt8QlOq9UHeDqmEO0enpcTl7i6fvz62Q8jTXNRBMrqQklILkAnGYkS1lVZoRqLinTmNNSSKY5xrhQUlVCi0oWlJU4EYKxAiQbotvD3Z0KWtWVV1abkO-82Sr_k1PBJeMsbXvk0NPeheChOlUIzjudeacz73TmR50tI_8w2jSqM9R4Zep_yZsDaQDg9EnIlGYsYb_39oS6
CODEN ITASD8
CitedBy_id crossref_primary_10_1016_j_specom_2018_03_010
crossref_primary_10_1016_j_csl_2015_07_005
crossref_primary_10_1109_TASLP_2024_3473319
crossref_primary_10_1109_TIFS_2019_2941773
crossref_primary_10_1109_ACCESS_2016_2607778
crossref_primary_10_26634_jdp_2_4_3145
crossref_primary_10_1186_s13636_017_0120_6
crossref_primary_10_1016_j_ijleo_2021_166786
crossref_primary_10_1109_TASLP_2022_3155285
crossref_primary_10_2174_2210327909666181219143918
crossref_primary_10_1121_1_5020273
crossref_primary_10_1109_TASL_2012_2193391
crossref_primary_10_1016_j_dsp_2014_06_007
crossref_primary_10_1109_TASLP_2014_2308398
crossref_primary_10_1186_s13636_014_0040_7
crossref_primary_10_1007_s12065_020_00378_9
crossref_primary_10_1007_s00034_019_01157_3
crossref_primary_10_4304_jmm_9_5_660_667
crossref_primary_10_1016_j_specom_2016_12_002
crossref_primary_10_1142_S0219843615500322
crossref_primary_10_1007_s00521_016_2470_x
crossref_primary_10_1186_s13636_020_00188_y
crossref_primary_10_1016_j_specom_2015_05_009
crossref_primary_10_1121_1_4901711
crossref_primary_10_1080_02564602_2016_1185976
crossref_primary_10_1109_TASLP_2017_2661712
crossref_primary_10_1142_S2424922X20500114
Cites_doi 10.1109/TASL.2008.916055
10.1016/j.specom.2004.03.005
10.1109/ICASSP.2008.4518665
10.1109/TASL.2006.881696
10.1109/ICASSP.2003.1198721
10.1121/1.2363929
10.1109/TASSP.1985.1164550
10.1016/j.specom.2004.02.005
10.1007/0-387-22794-6_12
10.1109/TNN.2004.832812
10.1111/j.2517-6161.1977.tb01600.x
10.1109/TIT.1982.1056489
10.1109/ICASSP.1995.479387
10.1109/89.365379
10.1121/1.1914702
10.1109/ICASSP.2004.1325983
10.1109/ICASSP.1994.389269
10.1109/89.748118
10.1201/9781420015836
10.1109/97.988717
10.1016/j.patrec.2005.10.010
10.1109/89.279283
10.1016/j.specom.2006.09.003
10.1109/TASSP.1980.1163420
10.1109/TASSP.1984.1164453
10.21437/Eurospeech.1999-611
10.1016/S0167-6393(00)00034-0
10.1016/S0167-6393(97)00021-6
10.1016/j.sigpro.2005.07.037
10.1109/TSA.2005.860354
10.1109/78.127947
10.1016/j.specom.2005.08.005
10.1016/j.specom.2007.05.003
10.1006/dspr.1999.0361
10.21437/Eurospeech.1995-370
10.1109/TASL.2006.881700
10.1016/S0167-6393(00)00051-0
10.21437/Eurospeech.1997-411
10.1109/ICASSP.1994.389721
10.1109/ICASSP.1979.1170788
10.1121/1.1610463
10.1109/29.1598
10.1016/0378-5955(90)90170-T
10.1109/89.326616
10.1109/ICASSP.2008.4518739
10.1109/89.928915
10.1109/TASSP.1979.1163209
ContentType Journal Article
Copyright 2015 INIST-CNRS
Copyright_xml – notice: 2015 INIST-CNRS
DBID 97E
RIA
RIE
AAYXX
CITATION
IQODW
DOI 10.1109/TASL.2011.2158309
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Pascal-Francis
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Applied Sciences
EISSN 1558-7924
EndPage 121
ExternalDocumentID 25473438
10_1109_TASL_2011_2158309
5782936
Genre orig-research
GroupedDBID 0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
AETIX
AGQYO
AGSQL
AHBIQ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
F5P
HZ~
IFIPE
IPLJI
JAVBF
LAI
M43
O9-
OCL
RIA
RIE
RNS
AAYXX
CITATION
RIG
IQODW
ID FETCH-LOGICAL-c295t-a7c121e745e0891983fd912c4723c042d573c4000ba7af5c5f7b23d065533be73
IEDL.DBID RIE
ISSN 1558-7916
IngestDate Mon Jul 21 09:12:48 EDT 2025
Thu Apr 24 22:55:49 EDT 2025
Tue Jul 01 05:27:16 EDT 2025
Tue Aug 26 17:18:10 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Audio signal processing
Background
mel frequency cepstral coefficient (MFCC)
universal background model (UBM)
Non stationary condition
Speaker adaptation
Speaker recognition
Acoustic signal processing
Background noise
Modeling
Missing data
Automatic speaker recognition (ASR)
Cepstral analysis
Noise immunity
Robustness
Automatic recognition
mask estimation
Speech processing
noise robustness
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
CC BY 4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-a7c121e745e0891983fd912c4723c042d573c4000ba7af5c5f7b23d065533be73
PageCount 14
ParticipantIDs crossref_primary_10_1109_TASL_2011_2158309
crossref_citationtrail_10_1109_TASL_2011_2158309
ieee_primary_5782936
pascalfrancis_primary_25473438
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2012-Jan.
2012-01-00
2012
PublicationDateYYYYMMDD 2012-01-01
PublicationDate_xml – month: 01
  year: 2012
  text: 2012-Jan.
PublicationDecade 2010
PublicationPlace Piscataway, NJ
PublicationPlace_xml – name: Piscataway, NJ
PublicationTitle IEEE transactions on audio, speech, and language processing
PublicationTitleAbbrev TASL
PublicationYear 2012
Publisher IEEE
Institute of Electrical and Electronics Engineers
Publisher_xml – name: IEEE
– name: Institute of Electrical and Electronics Engineers
References ref13
tibrewala (ref57) 1997
ref56
ref15
ref58
ref14
ref55
loizou (ref30) 2007
ref16
ref19
ref18
doblinger (ref33) 1995
drygajlo (ref10) 1998
shao (ref12) 2006
cooke (ref50) 2006
vizinho (ref27) 1999
ref46
martin (ref36) 2006; 86
ref48
ref47
ref42
ref41
ref44
ref43
ref8
ref7
nabney (ref53) 2004
ref4
ref3
ref6
ref5
ref40
(ref52) 1983
ref35
ref34
ref37
ref31
ref32
varga (ref51) 1992
ref2
shao (ref11) 2003; 2
brookes (ref49) 2009
ref1
ref39
ref38
ellis (ref54) 2009
dempster (ref17) 1977; 39
ref24
ref23
ref26
ref25
barker (ref59) 2000
renevey (ref45) 2000; 3
ref20
ref22
ref21
ref28
ref29
drygajlo (ref9) 1998; 1
References_xml – ident: ref48
  doi: 10.1109/TASL.2008.916055
– start-page: 80
  year: 1998
  ident: ref10
  article-title: Use of the generalized spectral subtraction and missing feature compensation for robust speaker verification
  publication-title: Proc RLA2C
– ident: ref25
  doi: 10.1016/j.specom.2004.03.005
– ident: ref13
  doi: 10.1109/ICASSP.2008.4518665
– start-page: 645
  year: 2006
  ident: ref12
  article-title: Robust speaker recognition using binary time-frequency masks
  publication-title: Proc ICASSP
– year: 2004
  ident: ref53
  publication-title: NETLAB Package 20012004
– year: 2009
  ident: ref54
  publication-title: PLP and RASTA (and MFCC and Inversion) in Matlab
– year: 2009
  ident: ref49
  publication-title: VOICEBOX Speech Processing Toolbox for MATLAB
– year: 2006
  ident: ref50
  publication-title: Speech separation and recognition competition
– ident: ref47
  doi: 10.1109/TASL.2006.881696
– ident: ref32
  doi: 10.1109/ICASSP.2003.1198721
– volume: 3
  start-page: 1731
  year: 2000
  ident: ref45
  article-title: Statistical estimation of unreliable features for robust speech recognition
  publication-title: Proc ICASSP
– ident: ref19
  doi: 10.1121/1.2363929
– ident: ref41
  doi: 10.1109/TASSP.1985.1164550
– volume: 2
  start-page: 205
  year: 2003
  ident: ref11
  article-title: Co-channel speaker identification using usable speech extraction based on multi-pitch tracking
  publication-title: Proc ICASSP
– ident: ref6
  doi: 10.1016/j.specom.2004.02.005
– ident: ref20
  doi: 10.1007/0-387-22794-6_12
– ident: ref21
  doi: 10.1109/TNN.2004.832812
– volume: 39
  start-page: 1
  year: 1977
  ident: ref17
  article-title: Maximum likelihood estimation from incomplete data via the EM algorithm
  publication-title: J R Statist Soc B
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– ident: ref16
  doi: 10.1109/TIT.1982.1056489
– ident: ref31
  doi: 10.1109/ICASSP.1995.479387
– ident: ref8
  doi: 10.1109/89.365379
– ident: ref2
  doi: 10.1121/1.1914702
– ident: ref37
  doi: 10.1109/ICASSP.2004.1325983
– ident: ref28
  doi: 10.1109/ICASSP.1994.389269
– ident: ref39
  doi: 10.1109/89.748118
– year: 2007
  ident: ref30
  publication-title: Speech Enhancement Theory and Practice
  doi: 10.1201/9781420015836
– ident: ref34
  doi: 10.1109/97.988717
– ident: ref58
  doi: 10.1016/j.patrec.2005.10.010
– ident: ref43
  doi: 10.1109/89.279283
– ident: ref7
  doi: 10.1016/j.specom.2006.09.003
– start-page: 373
  year: 2000
  ident: ref59
  article-title: Soft decisions in missing data techniques for robust automatic speech recognition
  publication-title: Proc ICSLP
– ident: ref5
  doi: 10.1109/TASSP.1980.1163420
– ident: ref40
  doi: 10.1109/TASSP.1984.1164453
– start-page: 2407
  year: 1999
  ident: ref27
  article-title: Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: An integrated study
  publication-title: Proc EUROSPEECH
  doi: 10.21437/Eurospeech.1999-611
– ident: ref4
  doi: 10.1016/S0167-6393(00)00034-0
– ident: ref1
  doi: 10.1016/S0167-6393(97)00021-6
– volume: 86
  start-page: 1215
  year: 2006
  ident: ref36
  article-title: Bias compensation methods for minimum statistics noise power spectral density estimation
  publication-title: Signal Process
  doi: 10.1016/j.sigpro.2005.07.037
– volume: 1
  start-page: 121
  year: 1998
  ident: ref9
  article-title: Speaker verification in noisy environment with combined spectral subtraction and missing data theory
  publication-title: Proc ICASSP
– ident: ref26
  doi: 10.1109/TSA.2005.860354
– ident: ref46
  doi: 10.1109/78.127947
– ident: ref38
  doi: 10.1016/j.specom.2005.08.005
– ident: ref23
  doi: 10.1016/j.specom.2007.05.003
– ident: ref15
  doi: 10.1006/dspr.1999.0361
– start-page: 1513
  year: 1995
  ident: ref33
  article-title: Computationally efficient speech enhancement by spectral minima tracking in subbands
  publication-title: Proc EUROSPEECH
  doi: 10.21437/Eurospeech.1995-370
– ident: ref22
  doi: 10.1109/TASL.2006.881700
– ident: ref29
  doi: 10.1016/S0167-6393(00)00051-0
– year: 1997
  ident: ref57
  article-title: Multi-band and adaptation approaches to robust speech recognition
  publication-title: Proc EUROSPEECH
  doi: 10.21437/Eurospeech.1997-411
– ident: ref56
  doi: 10.1109/ICASSP.1994.389721
– ident: ref44
  doi: 10.1109/ICASSP.1979.1170788
– ident: ref24
  doi: 10.1121/1.1610463
– ident: ref55
  doi: 10.1109/29.1598
– ident: ref18
  doi: 10.1016/0378-5955(90)90170-T
– ident: ref3
  doi: 10.1109/89.326616
– ident: ref14
  doi: 10.1109/ICASSP.2008.4518739
– year: 1983
  ident: ref52
  publication-title: American National Standard Specification for Sound Level Meters
– year: 1992
  ident: ref51
  publication-title: The NOISEX-92 study on the effect of additive noise on automatic speaker recognition
– ident: ref35
  doi: 10.1109/89.928915
– ident: ref42
  doi: 10.1109/TASSP.1979.1163209
SSID ssj0043641
Score 2.261046
Snippet Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against...
SourceID pascalfrancis
crossref
ieee
SourceType Index Database
Enrichment Source
Publisher
StartPage 108
SubjectTerms Adaptation model
Applied sciences
Automatic speaker recognition (ASR)
Data models
Estimation
Exact sciences and technology
Information, signal and communications theory
mask estimation
Materials
mel frequency cepstral coefficient (MFCC)
Miscellaneous
missing data
noise robustness
Signal processing
Speaker recognition
Speech
Speech processing
Speech recognition
Telecommunications and information theory
universal background model (UBM)
Title Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling
URI https://ieeexplore.ieee.org/document/5782936
Volume 20
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJziUV1G3BeQDJ0SWJI5j-0gLCCGWAywSt8h2xlK1VYLY7IVf33Eeqy1CFadEiR1b-fwY2zPfB3BiYyylECLSUpgo005FxkgfpbzMfKotjYAh3nlyn988ZbfP4nkNzpaxMIjYOp_hONy2Z_ll7RZhq-w8UK9rnq_DOi3culitYdTNeJ513KhCBQrGvD_BTGJ9Pr14vOvIOml-Uzz4Hq7MQa2oSnCJNHP6K76Ts1iZY663YTLUrnMtmY0XjR27t3fEjZ-t_g586Y1NdtG1jl1Yw2oPtlYoCPehvK9_zzF6qO1i3rDHFzQzfGUPg1tRXTEaMWyrIsEmBFK4XprGsOlA_jpnpipZ7-BBpf00bhZiRehhEFoL4e5f4en6avrrJuqVFyKXatFERrokTVBmAmOlE624L3WSukym3FE3L4Xkjnp_bI00XjjhpSV0yZwh69Gi5AewUdUVfgNmbY6eFonKkbEWK2e1ymNXCjJbLHr0I4gHLArX05IHdYw_Rbs8iXUR4CsCfEUP3whOl1leOk6O_yXeD0gsE_YgjOD4H8CX79Mgxpxx9f3jfD9gk76edpswh7DRvC7wiMySxh637fEvgD3e7A
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcgAOvApieRQfOCGyTew4to_lUS2wu4d2K_UW2c5YqhYlVTd74dczzmO1IIQ4JUps5fHZnrE9830A71yKlZJSJkZJm-TG68RaFRIuqjxw42gEjPnOi2Uxu8y_XcmrA_iwy4VBxC74DKfxtNvLrxq_jUtlJ5F63YjiDtwluy95n601jru5KPKeHVXqSMJYDHuYWWpOVqcX856ukyycFjH6cM8KdbIqMSjSbui_hF7QYs_KnD2Cxfh-fXDJerpt3dT__IO68X8_4DE8HNxNdtq3jydwgPVTeLBHQngE1bK53mBy3rjtpmUXN2jXeMvOx8CipmY0ZrhOR4ItCKZ4_Gxby1Yj_euG2bpiQ4gHPe2j9euYLUIXo9RaTHh_BpdnX1afZsmgvZB4bmSbWOUznqHKJabaZEaLUJmM-1xx4amjV1IJT_0_dVbZIL0MyhG-5NCQ_-hQiedwWDc1vgDmXIGBponak7uWau-MLlJfSXJcHAYME0hHLEo_EJNHfYwfZTdBSU0Z4SsjfOUA3wTe76rc9Kwc_yp8FJHYFRxAmMDxb4Dv7vMox5wL_fLv9d7CvdlqMS_nX5ffX8F9ehLvl2Rew2F7u8U35KS07rhrm78AS_ziNg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Noise-Robust+Speaker+Recognition+Combining+Missing+Data+Techniques+and+Universal+Background+Modeling&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=MAY%2C+Tobias&rft.au=DE+PAR%2C+Steven+Van&rft.au=KOHLRAUSCH%2C+Armin&rft.date=2012&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1558-7916&rft.volume=20&rft.issue=1&rft.spage=108&rft.epage=121&rft_id=info:doi/10.1109%2FTASL.2011.2158309&rft.externalDBID=n%2Fa&rft.externalDocID=25473438
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon