Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling

Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) r...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 20; no. 1; pp. 108 - 121
Main Authors	May, T., van de Par, S., Kohlrausch, A.
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.01.2012 Institute of Electrical and Electronics Engineers
Subjects	Adaptation model Applied sciences Automatic speaker recognition (ASR) Data models Estimation Exact sciences and technology Information, signal and communications theory mask estimation Materials mel frequency cepstral coefficient (MFCC) Miscellaneous missing data noise robustness Signal processing Speaker recognition Speech Speech processing Speech recognition Telecommunications and information theory universal background model (UBM) Audio signal processing Background mel frequency cepstral coefficient (MFCC) universal background model (UBM) Non stationary condition Speaker adaptation Speaker recognition Acoustic signal processing Background noise Modeling Missing data Automatic speaker recognition (ASR) Cepstral analysis Noise immunity Robustness Automatic recognition mask estimation Speech processing noise robustness
Online Access	Get full text

Cover

Loading…

Abstract	Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) recognition with the adaptation of speaker models using a universal background model (UBM). For MD recognition, the identification of reliable and unreliable feature components is required. For this purpose, the signal-to-noise ratio (SNR) based mask estimation performance of various state-of-the art noise estimation techniques and noise reduction schemes is compared. Speaker recognition experiments show that the usage of a UBM in combination with missing data recognition yields substantial improvements in recognition performance, especially in the presence of highly non-stationary background noise at low SNRs.
AbstractList	Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) recognition with the adaptation of speaker models using a universal background model (UBM). For MD recognition, the identification of reliable and unreliable feature components is required. For this purpose, the signal-to-noise ratio (SNR) based mask estimation performance of various state-of-the art noise estimation techniques and noise reduction schemes is compared. Speaker recognition experiments show that the usage of a UBM in combination with missing data recognition yields substantial improvements in recognition performance, especially in the presence of highly non-stationary background noise at low SNRs.
Author	May, T. Kohlrausch, A. van de Par, S.
Author_xml	– sequence: 1 givenname: T. surname: May fullname: May, T. email: tobias.may@uni-oldenburg.de organization: Inst. of Phys., Univ. of Oldenburg, Oldenburg, Germany – sequence: 2 givenname: S. surname: van de Par fullname: van de Par, S. organization: Inst. of Phys., Univ. of Oldenburg, Oldenburg, Germany – sequence: 3 givenname: A. surname: Kohlrausch fullname: Kohlrausch, A. organization: Philips Res., Eindhoven, Netherlands
BackLink	http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=25473438$$DView record in Pascal Francis
BookMark	eNp9kMtOwzAQRS1UJNrCByA22bBM8TNOlqU8pRakPtaR40yKaWoXO0Xi70nUqgsWrO5odM-MdAaoZ50FhK4JHhGCs7vleDEdUUzIiBKRMpydoT4RIo1lRnnvNJPkAg1C-MSYs4STPirfnAkQz12xD0202IHagI_moN3amsY4G03ctjDW2HU0MyF0-aAaFS1Bf1jztYcQKVtGK2u-wQdVR_dKb9be7dvlzJVQt8QlOq9UHeDqmEO0enpcTl7i6fvz62Q8jTXNRBMrqQklILkAnGYkS1lVZoRqLinTmNNSSKY5xrhQUlVCi0oWlJU4EYKxAiQbotvD3Z0KWtWVV1abkO-82Sr_k1PBJeMsbXvk0NPeheChOlUIzjudeacz73TmR50tI_8w2jSqM9R4Zep_yZsDaQDg9EnIlGYsYb_39oS6
CODEN	ITASD8
CitedBy_id	crossref_primary_10_1016_j_specom_2018_03_010 crossref_primary_10_1016_j_csl_2015_07_005 crossref_primary_10_1109_TASLP_2024_3473319 crossref_primary_10_1109_TIFS_2019_2941773 crossref_primary_10_1109_ACCESS_2016_2607778 crossref_primary_10_26634_jdp_2_4_3145 crossref_primary_10_1186_s13636_017_0120_6 crossref_primary_10_1016_j_ijleo_2021_166786 crossref_primary_10_1109_TASLP_2022_3155285 crossref_primary_10_2174_2210327909666181219143918 crossref_primary_10_1121_1_5020273 crossref_primary_10_1109_TASL_2012_2193391 crossref_primary_10_1016_j_dsp_2014_06_007 crossref_primary_10_1109_TASLP_2014_2308398 crossref_primary_10_1186_s13636_014_0040_7 crossref_primary_10_1007_s12065_020_00378_9 crossref_primary_10_1007_s00034_019_01157_3 crossref_primary_10_4304_jmm_9_5_660_667 crossref_primary_10_1016_j_specom_2016_12_002 crossref_primary_10_1142_S0219843615500322 crossref_primary_10_1007_s00521_016_2470_x crossref_primary_10_1186_s13636_020_00188_y crossref_primary_10_1016_j_specom_2015_05_009 crossref_primary_10_1121_1_4901711 crossref_primary_10_1080_02564602_2016_1185976 crossref_primary_10_1109_TASLP_2017_2661712 crossref_primary_10_1142_S2424922X20500114
Cites_doi	10.1109/TASL.2008.916055 10.1016/j.specom.2004.03.005 10.1109/ICASSP.2008.4518665 10.1109/TASL.2006.881696 10.1109/ICASSP.2003.1198721 10.1121/1.2363929 10.1109/TASSP.1985.1164550 10.1016/j.specom.2004.02.005 10.1007/0-387-22794-6_12 10.1109/TNN.2004.832812 10.1111/j.2517-6161.1977.tb01600.x 10.1109/TIT.1982.1056489 10.1109/ICASSP.1995.479387 10.1109/89.365379 10.1121/1.1914702 10.1109/ICASSP.2004.1325983 10.1109/ICASSP.1994.389269 10.1109/89.748118 10.1201/9781420015836 10.1109/97.988717 10.1016/j.patrec.2005.10.010 10.1109/89.279283 10.1016/j.specom.2006.09.003 10.1109/TASSP.1980.1163420 10.1109/TASSP.1984.1164453 10.21437/Eurospeech.1999-611 10.1016/S0167-6393(00)00034-0 10.1016/S0167-6393(97)00021-6 10.1016/j.sigpro.2005.07.037 10.1109/TSA.2005.860354 10.1109/78.127947 10.1016/j.specom.2005.08.005 10.1016/j.specom.2007.05.003 10.1006/dspr.1999.0361 10.21437/Eurospeech.1995-370 10.1109/TASL.2006.881700 10.1016/S0167-6393(00)00051-0 10.21437/Eurospeech.1997-411 10.1109/ICASSP.1994.389721 10.1109/ICASSP.1979.1170788 10.1121/1.1610463 10.1109/29.1598 10.1016/0378-5955(90)90170-T 10.1109/89.326616 10.1109/ICASSP.2008.4518739 10.1109/89.928915 10.1109/TASSP.1979.1163209
ContentType	Journal Article
Copyright	2015 INIST-CNRS
Copyright_xml	– notice: 2015 INIST-CNRS
DBID	97E RIA RIE AAYXX CITATION IQODW
DOI	10.1109/TASL.2011.2158309
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Pascal-Francis
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Applied Sciences
EISSN	1558-7924
EndPage	121
ExternalDocumentID	25473438 10_1109_TASL_2011_2158309 5782936
Genre	orig-research
GroupedDBID	0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG AETIX AGQYO AGSQL AHBIQ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RNS AAYXX CITATION RIG IQODW
ID	FETCH-LOGICAL-c295t-a7c121e745e0891983fd912c4723c042d573c4000ba7af5c5f7b23d065533be73
IEDL.DBID	RIE
ISSN	1558-7916
IngestDate	Mon Jul 21 09:12:48 EDT 2025 Thu Apr 24 22:55:49 EDT 2025 Tue Jul 01 05:27:16 EDT 2025 Tue Aug 26 17:18:10 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	1
Keywords	Audio signal processing Background mel frequency cepstral coefficient (MFCC) universal background model (UBM) Non stationary condition Speaker adaptation Speaker recognition Acoustic signal processing Background noise Modeling Missing data Automatic speaker recognition (ASR) Cepstral analysis Noise immunity Robustness Automatic recognition mask estimation Speech processing noise robustness
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html CC BY 4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c295t-a7c121e745e0891983fd912c4723c042d573c4000ba7af5c5f7b23d065533be73
PageCount	14
ParticipantIDs	crossref_primary_10_1109_TASL_2011_2158309 crossref_citationtrail_10_1109_TASL_2011_2158309 ieee_primary_5782936 pascalfrancis_primary_25473438
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2012-Jan. 2012-01-00 2012
PublicationDateYYYYMMDD	2012-01-01
PublicationDate_xml	– month: 01 year: 2012 text: 2012-Jan.
PublicationDecade	2010
PublicationPlace	Piscataway, NJ
PublicationPlace_xml	– name: Piscataway, NJ
PublicationTitle	IEEE transactions on audio, speech, and language processing
PublicationTitleAbbrev	TASL
PublicationYear	2012
Publisher	IEEE Institute of Electrical and Electronics Engineers
Publisher_xml	– name: IEEE – name: Institute of Electrical and Electronics Engineers
References	ref13 tibrewala (ref57) 1997 ref56 ref15 ref58 ref14 ref55 loizou (ref30) 2007 ref16 ref19 ref18 doblinger (ref33) 1995 drygajlo (ref10) 1998 shao (ref12) 2006 cooke (ref50) 2006 vizinho (ref27) 1999 ref46 martin (ref36) 2006; 86 ref48 ref47 ref42 ref41 ref44 ref43 ref8 ref7 nabney (ref53) 2004 ref4 ref3 ref6 ref5 ref40 (ref52) 1983 ref35 ref34 ref37 ref31 ref32 varga (ref51) 1992 ref2 shao (ref11) 2003; 2 brookes (ref49) 2009 ref1 ref39 ref38 ellis (ref54) 2009 dempster (ref17) 1977; 39 ref24 ref23 ref26 ref25 barker (ref59) 2000 renevey (ref45) 2000; 3 ref20 ref22 ref21 ref28 ref29 drygajlo (ref9) 1998; 1
References_xml	– ident: ref48 doi: 10.1109/TASL.2008.916055 – start-page: 80 year: 1998 ident: ref10 article-title: Use of the generalized spectral subtraction and missing feature compensation for robust speaker verification publication-title: Proc RLA2C – ident: ref25 doi: 10.1016/j.specom.2004.03.005 – ident: ref13 doi: 10.1109/ICASSP.2008.4518665 – start-page: 645 year: 2006 ident: ref12 article-title: Robust speaker recognition using binary time-frequency masks publication-title: Proc ICASSP – year: 2004 ident: ref53 publication-title: NETLAB Package 20012004 – year: 2009 ident: ref54 publication-title: PLP and RASTA (and MFCC and Inversion) in Matlab – year: 2009 ident: ref49 publication-title: VOICEBOX Speech Processing Toolbox for MATLAB – year: 2006 ident: ref50 publication-title: Speech separation and recognition competition – ident: ref47 doi: 10.1109/TASL.2006.881696 – ident: ref32 doi: 10.1109/ICASSP.2003.1198721 – volume: 3 start-page: 1731 year: 2000 ident: ref45 article-title: Statistical estimation of unreliable features for robust speech recognition publication-title: Proc ICASSP – ident: ref19 doi: 10.1121/1.2363929 – ident: ref41 doi: 10.1109/TASSP.1985.1164550 – volume: 2 start-page: 205 year: 2003 ident: ref11 article-title: Co-channel speaker identification using usable speech extraction based on multi-pitch tracking publication-title: Proc ICASSP – ident: ref6 doi: 10.1016/j.specom.2004.02.005 – ident: ref20 doi: 10.1007/0-387-22794-6_12 – ident: ref21 doi: 10.1109/TNN.2004.832812 – volume: 39 start-page: 1 year: 1977 ident: ref17 article-title: Maximum likelihood estimation from incomplete data via the EM algorithm publication-title: J R Statist Soc B doi: 10.1111/j.2517-6161.1977.tb01600.x – ident: ref16 doi: 10.1109/TIT.1982.1056489 – ident: ref31 doi: 10.1109/ICASSP.1995.479387 – ident: ref8 doi: 10.1109/89.365379 – ident: ref2 doi: 10.1121/1.1914702 – ident: ref37 doi: 10.1109/ICASSP.2004.1325983 – ident: ref28 doi: 10.1109/ICASSP.1994.389269 – ident: ref39 doi: 10.1109/89.748118 – year: 2007 ident: ref30 publication-title: Speech Enhancement Theory and Practice doi: 10.1201/9781420015836 – ident: ref34 doi: 10.1109/97.988717 – ident: ref58 doi: 10.1016/j.patrec.2005.10.010 – ident: ref43 doi: 10.1109/89.279283 – ident: ref7 doi: 10.1016/j.specom.2006.09.003 – start-page: 373 year: 2000 ident: ref59 article-title: Soft decisions in missing data techniques for robust automatic speech recognition publication-title: Proc ICSLP – ident: ref5 doi: 10.1109/TASSP.1980.1163420 – ident: ref40 doi: 10.1109/TASSP.1984.1164453 – start-page: 2407 year: 1999 ident: ref27 article-title: Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: An integrated study publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.1999-611 – ident: ref4 doi: 10.1016/S0167-6393(00)00034-0 – ident: ref1 doi: 10.1016/S0167-6393(97)00021-6 – volume: 86 start-page: 1215 year: 2006 ident: ref36 article-title: Bias compensation methods for minimum statistics noise power spectral density estimation publication-title: Signal Process doi: 10.1016/j.sigpro.2005.07.037 – volume: 1 start-page: 121 year: 1998 ident: ref9 article-title: Speaker verification in noisy environment with combined spectral subtraction and missing data theory publication-title: Proc ICASSP – ident: ref26 doi: 10.1109/TSA.2005.860354 – ident: ref46 doi: 10.1109/78.127947 – ident: ref38 doi: 10.1016/j.specom.2005.08.005 – ident: ref23 doi: 10.1016/j.specom.2007.05.003 – ident: ref15 doi: 10.1006/dspr.1999.0361 – start-page: 1513 year: 1995 ident: ref33 article-title: Computationally efficient speech enhancement by spectral minima tracking in subbands publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.1995-370 – ident: ref22 doi: 10.1109/TASL.2006.881700 – ident: ref29 doi: 10.1016/S0167-6393(00)00051-0 – year: 1997 ident: ref57 article-title: Multi-band and adaptation approaches to robust speech recognition publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.1997-411 – ident: ref56 doi: 10.1109/ICASSP.1994.389721 – ident: ref44 doi: 10.1109/ICASSP.1979.1170788 – ident: ref24 doi: 10.1121/1.1610463 – ident: ref55 doi: 10.1109/29.1598 – ident: ref18 doi: 10.1016/0378-5955(90)90170-T – ident: ref3 doi: 10.1109/89.326616 – ident: ref14 doi: 10.1109/ICASSP.2008.4518739 – year: 1983 ident: ref52 publication-title: American National Standard Specification for Sound Level Meters – year: 1992 ident: ref51 publication-title: The NOISEX-92 study on the effect of additive noise on automatic speaker recognition – ident: ref35 doi: 10.1109/89.928915 – ident: ref42 doi: 10.1109/TASSP.1979.1163209
SSID	ssj0043641
Score	2.261046
Snippet	Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against...
SourceID	pascalfrancis crossref ieee
SourceType	Index Database Enrichment Source Publisher
StartPage	108
SubjectTerms	Adaptation model Applied sciences Automatic speaker recognition (ASR) Data models Estimation Exact sciences and technology Information, signal and communications theory mask estimation Materials mel frequency cepstral coefficient (MFCC) Miscellaneous missing data noise robustness Signal processing Speaker recognition Speech Speech processing Speech recognition Telecommunications and information theory universal background model (UBM)
Title	Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling
URI	https://ieeexplore.ieee.org/document/5782936
Volume	20
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJziUV1G3BeQDJ0SWJI5j-0gLCCGWAywSt8h2xlK1VYLY7IVf33Eeqy1CFadEiR1b-fwY2zPfB3BiYyylECLSUpgo005FxkgfpbzMfKotjYAh3nlyn988ZbfP4nkNzpaxMIjYOp_hONy2Z_ll7RZhq-w8UK9rnq_DOi3culitYdTNeJ513KhCBQrGvD_BTGJ9Pr14vOvIOml-Uzz4Hq7MQa2oSnCJNHP6K76Ts1iZY663YTLUrnMtmY0XjR27t3fEjZ-t_g586Y1NdtG1jl1Yw2oPtlYoCPehvK9_zzF6qO1i3rDHFzQzfGUPg1tRXTEaMWyrIsEmBFK4XprGsOlA_jpnpipZ7-BBpf00bhZiRehhEFoL4e5f4en6avrrJuqVFyKXatFERrokTVBmAmOlE624L3WSukym3FE3L4Xkjnp_bI00XjjhpSV0yZwh69Gi5AewUdUVfgNmbY6eFonKkbEWK2e1ymNXCjJbLHr0I4gHLArX05IHdYw_Rbs8iXUR4CsCfEUP3whOl1leOk6O_yXeD0gsE_YgjOD4H8CX79Mgxpxx9f3jfD9gk76edpswh7DRvC7wiMySxh637fEvgD3e7A
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcgAOvApieRQfOCGyTew4to_lUS2wu4d2K_UW2c5YqhYlVTd74dczzmO1IIQ4JUps5fHZnrE9830A71yKlZJSJkZJm-TG68RaFRIuqjxw42gEjPnOi2Uxu8y_XcmrA_iwy4VBxC74DKfxtNvLrxq_jUtlJ5F63YjiDtwluy95n601jru5KPKeHVXqSMJYDHuYWWpOVqcX856ukyycFjH6cM8KdbIqMSjSbui_hF7QYs_KnD2Cxfh-fXDJerpt3dT__IO68X8_4DE8HNxNdtq3jydwgPVTeLBHQngE1bK53mBy3rjtpmUXN2jXeMvOx8CipmY0ZrhOR4ItCKZ4_Gxby1Yj_euG2bpiQ4gHPe2j9euYLUIXo9RaTHh_BpdnX1afZsmgvZB4bmSbWOUznqHKJabaZEaLUJmM-1xx4amjV1IJT_0_dVbZIL0MyhG-5NCQ_-hQiedwWDc1vgDmXIGBponak7uWau-MLlJfSXJcHAYME0hHLEo_EJNHfYwfZTdBSU0Z4SsjfOUA3wTe76rc9Kwc_yp8FJHYFRxAmMDxb4Dv7vMox5wL_fLv9d7CvdlqMS_nX5ffX8F9ehLvl2Rew2F7u8U35KS07rhrm78AS_ziNg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Noise-Robust+Speaker+Recognition+Combining+Missing+Data+Techniques+and+Universal+Background+Modeling&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=MAY%2C+Tobias&rft.au=DE+PAR%2C+Steven+Van&rft.au=KOHLRAUSCH%2C+Armin&rft.date=2012&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1558-7916&rft.volume=20&rft.issue=1&rft.spage=108&rft.epage=121&rft_id=info:doi/10.1109%2FTASL.2011.2158309&rft.externalDBID=n%2Fa&rft.externalDocID=25473438
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon