Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling
Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) r...
Saved in:
Published in | IEEE transactions on audio, speech, and language processing Vol. 20; no. 1; pp. 108 - 121 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Piscataway, NJ
IEEE
01.01.2012
Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) recognition with the adaptation of speaker models using a universal background model (UBM). For MD recognition, the identification of reliable and unreliable feature components is required. For this purpose, the signal-to-noise ratio (SNR) based mask estimation performance of various state-of-the art noise estimation techniques and noise reduction schemes is compared. Speaker recognition experiments show that the usage of a UBM in combination with missing data recognition yields substantial improvements in recognition performance, especially in the presence of highly non-stationary background noise at low SNRs. |
---|---|
AbstractList | Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against background noise has remained a major challenge. This paper describes a noise-robust speaker recognition system that combines missing data (MD) recognition with the adaptation of speaker models using a universal background model (UBM). For MD recognition, the identification of reliable and unreliable feature components is required. For this purpose, the signal-to-noise ratio (SNR) based mask estimation performance of various state-of-the art noise estimation techniques and noise reduction schemes is compared. Speaker recognition experiments show that the usage of a UBM in combination with missing data recognition yields substantial improvements in recognition performance, especially in the presence of highly non-stationary background noise at low SNRs. |
Author | May, T. Kohlrausch, A. van de Par, S. |
Author_xml | – sequence: 1 givenname: T. surname: May fullname: May, T. email: tobias.may@uni-oldenburg.de organization: Inst. of Phys., Univ. of Oldenburg, Oldenburg, Germany – sequence: 2 givenname: S. surname: van de Par fullname: van de Par, S. organization: Inst. of Phys., Univ. of Oldenburg, Oldenburg, Germany – sequence: 3 givenname: A. surname: Kohlrausch fullname: Kohlrausch, A. organization: Philips Res., Eindhoven, Netherlands |
BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=25473438$$DView record in Pascal Francis |
BookMark | eNp9kMtOwzAQRS1UJNrCByA22bBM8TNOlqU8pRakPtaR40yKaWoXO0Xi70nUqgsWrO5odM-MdAaoZ50FhK4JHhGCs7vleDEdUUzIiBKRMpydoT4RIo1lRnnvNJPkAg1C-MSYs4STPirfnAkQz12xD0202IHagI_moN3amsY4G03ctjDW2HU0MyF0-aAaFS1Bf1jztYcQKVtGK2u-wQdVR_dKb9be7dvlzJVQt8QlOq9UHeDqmEO0enpcTl7i6fvz62Q8jTXNRBMrqQklILkAnGYkS1lVZoRqLinTmNNSSKY5xrhQUlVCi0oWlJU4EYKxAiQbotvD3Z0KWtWVV1abkO-82Sr_k1PBJeMsbXvk0NPeheChOlUIzjudeacz73TmR50tI_8w2jSqM9R4Zep_yZsDaQDg9EnIlGYsYb_39oS6 |
CODEN | ITASD8 |
CitedBy_id | crossref_primary_10_1016_j_specom_2018_03_010 crossref_primary_10_1016_j_csl_2015_07_005 crossref_primary_10_1109_TASLP_2024_3473319 crossref_primary_10_1109_TIFS_2019_2941773 crossref_primary_10_1109_ACCESS_2016_2607778 crossref_primary_10_26634_jdp_2_4_3145 crossref_primary_10_1186_s13636_017_0120_6 crossref_primary_10_1016_j_ijleo_2021_166786 crossref_primary_10_1109_TASLP_2022_3155285 crossref_primary_10_2174_2210327909666181219143918 crossref_primary_10_1121_1_5020273 crossref_primary_10_1109_TASL_2012_2193391 crossref_primary_10_1016_j_dsp_2014_06_007 crossref_primary_10_1109_TASLP_2014_2308398 crossref_primary_10_1186_s13636_014_0040_7 crossref_primary_10_1007_s12065_020_00378_9 crossref_primary_10_1007_s00034_019_01157_3 crossref_primary_10_4304_jmm_9_5_660_667 crossref_primary_10_1016_j_specom_2016_12_002 crossref_primary_10_1142_S0219843615500322 crossref_primary_10_1007_s00521_016_2470_x crossref_primary_10_1186_s13636_020_00188_y crossref_primary_10_1016_j_specom_2015_05_009 crossref_primary_10_1121_1_4901711 crossref_primary_10_1080_02564602_2016_1185976 crossref_primary_10_1109_TASLP_2017_2661712 crossref_primary_10_1142_S2424922X20500114 |
Cites_doi | 10.1109/TASL.2008.916055 10.1016/j.specom.2004.03.005 10.1109/ICASSP.2008.4518665 10.1109/TASL.2006.881696 10.1109/ICASSP.2003.1198721 10.1121/1.2363929 10.1109/TASSP.1985.1164550 10.1016/j.specom.2004.02.005 10.1007/0-387-22794-6_12 10.1109/TNN.2004.832812 10.1111/j.2517-6161.1977.tb01600.x 10.1109/TIT.1982.1056489 10.1109/ICASSP.1995.479387 10.1109/89.365379 10.1121/1.1914702 10.1109/ICASSP.2004.1325983 10.1109/ICASSP.1994.389269 10.1109/89.748118 10.1201/9781420015836 10.1109/97.988717 10.1016/j.patrec.2005.10.010 10.1109/89.279283 10.1016/j.specom.2006.09.003 10.1109/TASSP.1980.1163420 10.1109/TASSP.1984.1164453 10.21437/Eurospeech.1999-611 10.1016/S0167-6393(00)00034-0 10.1016/S0167-6393(97)00021-6 10.1016/j.sigpro.2005.07.037 10.1109/TSA.2005.860354 10.1109/78.127947 10.1016/j.specom.2005.08.005 10.1016/j.specom.2007.05.003 10.1006/dspr.1999.0361 10.21437/Eurospeech.1995-370 10.1109/TASL.2006.881700 10.1016/S0167-6393(00)00051-0 10.21437/Eurospeech.1997-411 10.1109/ICASSP.1994.389721 10.1109/ICASSP.1979.1170788 10.1121/1.1610463 10.1109/29.1598 10.1016/0378-5955(90)90170-T 10.1109/89.326616 10.1109/ICASSP.2008.4518739 10.1109/89.928915 10.1109/TASSP.1979.1163209 |
ContentType | Journal Article |
Copyright | 2015 INIST-CNRS |
Copyright_xml | – notice: 2015 INIST-CNRS |
DBID | 97E RIA RIE AAYXX CITATION IQODW |
DOI | 10.1109/TASL.2011.2158309 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Pascal-Francis |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Applied Sciences |
EISSN | 1558-7924 |
EndPage | 121 |
ExternalDocumentID | 25473438 10_1109_TASL_2011_2158309 5782936 |
Genre | orig-research |
GroupedDBID | 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AASAJ AAWTH ABAZT ABQJQ ABVLG AETIX AGQYO AGSQL AHBIQ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD F5P HZ~ IFIPE IPLJI JAVBF LAI M43 O9- OCL RIA RIE RNS AAYXX CITATION RIG IQODW |
ID | FETCH-LOGICAL-c295t-a7c121e745e0891983fd912c4723c042d573c4000ba7af5c5f7b23d065533be73 |
IEDL.DBID | RIE |
ISSN | 1558-7916 |
IngestDate | Mon Jul 21 09:12:48 EDT 2025 Thu Apr 24 22:55:49 EDT 2025 Tue Jul 01 05:27:16 EDT 2025 Tue Aug 26 17:18:10 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | Audio signal processing Background mel frequency cepstral coefficient (MFCC) universal background model (UBM) Non stationary condition Speaker adaptation Speaker recognition Acoustic signal processing Background noise Modeling Missing data Automatic speaker recognition (ASR) Cepstral analysis Noise immunity Robustness Automatic recognition mask estimation Speech processing noise robustness |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html CC BY 4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c295t-a7c121e745e0891983fd912c4723c042d573c4000ba7af5c5f7b23d065533be73 |
PageCount | 14 |
ParticipantIDs | crossref_primary_10_1109_TASL_2011_2158309 crossref_citationtrail_10_1109_TASL_2011_2158309 ieee_primary_5782936 pascalfrancis_primary_25473438 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2012-Jan. 2012-01-00 2012 |
PublicationDateYYYYMMDD | 2012-01-01 |
PublicationDate_xml | – month: 01 year: 2012 text: 2012-Jan. |
PublicationDecade | 2010 |
PublicationPlace | Piscataway, NJ |
PublicationPlace_xml | – name: Piscataway, NJ |
PublicationTitle | IEEE transactions on audio, speech, and language processing |
PublicationTitleAbbrev | TASL |
PublicationYear | 2012 |
Publisher | IEEE Institute of Electrical and Electronics Engineers |
Publisher_xml | – name: IEEE – name: Institute of Electrical and Electronics Engineers |
References | ref13 tibrewala (ref57) 1997 ref56 ref15 ref58 ref14 ref55 loizou (ref30) 2007 ref16 ref19 ref18 doblinger (ref33) 1995 drygajlo (ref10) 1998 shao (ref12) 2006 cooke (ref50) 2006 vizinho (ref27) 1999 ref46 martin (ref36) 2006; 86 ref48 ref47 ref42 ref41 ref44 ref43 ref8 ref7 nabney (ref53) 2004 ref4 ref3 ref6 ref5 ref40 (ref52) 1983 ref35 ref34 ref37 ref31 ref32 varga (ref51) 1992 ref2 shao (ref11) 2003; 2 brookes (ref49) 2009 ref1 ref39 ref38 ellis (ref54) 2009 dempster (ref17) 1977; 39 ref24 ref23 ref26 ref25 barker (ref59) 2000 renevey (ref45) 2000; 3 ref20 ref22 ref21 ref28 ref29 drygajlo (ref9) 1998; 1 |
References_xml | – ident: ref48 doi: 10.1109/TASL.2008.916055 – start-page: 80 year: 1998 ident: ref10 article-title: Use of the generalized spectral subtraction and missing feature compensation for robust speaker verification publication-title: Proc RLA2C – ident: ref25 doi: 10.1016/j.specom.2004.03.005 – ident: ref13 doi: 10.1109/ICASSP.2008.4518665 – start-page: 645 year: 2006 ident: ref12 article-title: Robust speaker recognition using binary time-frequency masks publication-title: Proc ICASSP – year: 2004 ident: ref53 publication-title: NETLAB Package 20012004 – year: 2009 ident: ref54 publication-title: PLP and RASTA (and MFCC and Inversion) in Matlab – year: 2009 ident: ref49 publication-title: VOICEBOX Speech Processing Toolbox for MATLAB – year: 2006 ident: ref50 publication-title: Speech separation and recognition competition – ident: ref47 doi: 10.1109/TASL.2006.881696 – ident: ref32 doi: 10.1109/ICASSP.2003.1198721 – volume: 3 start-page: 1731 year: 2000 ident: ref45 article-title: Statistical estimation of unreliable features for robust speech recognition publication-title: Proc ICASSP – ident: ref19 doi: 10.1121/1.2363929 – ident: ref41 doi: 10.1109/TASSP.1985.1164550 – volume: 2 start-page: 205 year: 2003 ident: ref11 article-title: Co-channel speaker identification using usable speech extraction based on multi-pitch tracking publication-title: Proc ICASSP – ident: ref6 doi: 10.1016/j.specom.2004.02.005 – ident: ref20 doi: 10.1007/0-387-22794-6_12 – ident: ref21 doi: 10.1109/TNN.2004.832812 – volume: 39 start-page: 1 year: 1977 ident: ref17 article-title: Maximum likelihood estimation from incomplete data via the EM algorithm publication-title: J R Statist Soc B doi: 10.1111/j.2517-6161.1977.tb01600.x – ident: ref16 doi: 10.1109/TIT.1982.1056489 – ident: ref31 doi: 10.1109/ICASSP.1995.479387 – ident: ref8 doi: 10.1109/89.365379 – ident: ref2 doi: 10.1121/1.1914702 – ident: ref37 doi: 10.1109/ICASSP.2004.1325983 – ident: ref28 doi: 10.1109/ICASSP.1994.389269 – ident: ref39 doi: 10.1109/89.748118 – year: 2007 ident: ref30 publication-title: Speech Enhancement Theory and Practice doi: 10.1201/9781420015836 – ident: ref34 doi: 10.1109/97.988717 – ident: ref58 doi: 10.1016/j.patrec.2005.10.010 – ident: ref43 doi: 10.1109/89.279283 – ident: ref7 doi: 10.1016/j.specom.2006.09.003 – start-page: 373 year: 2000 ident: ref59 article-title: Soft decisions in missing data techniques for robust automatic speech recognition publication-title: Proc ICSLP – ident: ref5 doi: 10.1109/TASSP.1980.1163420 – ident: ref40 doi: 10.1109/TASSP.1984.1164453 – start-page: 2407 year: 1999 ident: ref27 article-title: Missing data theory, spectral subtraction and signal-to-noise estimation for robust ASR: An integrated study publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.1999-611 – ident: ref4 doi: 10.1016/S0167-6393(00)00034-0 – ident: ref1 doi: 10.1016/S0167-6393(97)00021-6 – volume: 86 start-page: 1215 year: 2006 ident: ref36 article-title: Bias compensation methods for minimum statistics noise power spectral density estimation publication-title: Signal Process doi: 10.1016/j.sigpro.2005.07.037 – volume: 1 start-page: 121 year: 1998 ident: ref9 article-title: Speaker verification in noisy environment with combined spectral subtraction and missing data theory publication-title: Proc ICASSP – ident: ref26 doi: 10.1109/TSA.2005.860354 – ident: ref46 doi: 10.1109/78.127947 – ident: ref38 doi: 10.1016/j.specom.2005.08.005 – ident: ref23 doi: 10.1016/j.specom.2007.05.003 – ident: ref15 doi: 10.1006/dspr.1999.0361 – start-page: 1513 year: 1995 ident: ref33 article-title: Computationally efficient speech enhancement by spectral minima tracking in subbands publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.1995-370 – ident: ref22 doi: 10.1109/TASL.2006.881700 – ident: ref29 doi: 10.1016/S0167-6393(00)00051-0 – year: 1997 ident: ref57 article-title: Multi-band and adaptation approaches to robust speech recognition publication-title: Proc EUROSPEECH doi: 10.21437/Eurospeech.1997-411 – ident: ref56 doi: 10.1109/ICASSP.1994.389721 – ident: ref44 doi: 10.1109/ICASSP.1979.1170788 – ident: ref24 doi: 10.1121/1.1610463 – ident: ref55 doi: 10.1109/29.1598 – ident: ref18 doi: 10.1016/0378-5955(90)90170-T – ident: ref3 doi: 10.1109/89.326616 – ident: ref14 doi: 10.1109/ICASSP.2008.4518739 – year: 1983 ident: ref52 publication-title: American National Standard Specification for Sound Level Meters – year: 1992 ident: ref51 publication-title: The NOISEX-92 study on the effect of additive noise on automatic speaker recognition – ident: ref35 doi: 10.1109/89.928915 – ident: ref42 doi: 10.1109/TASSP.1979.1163209 |
SSID | ssj0043641 |
Score | 2.261046 |
Snippet | Although the field of automatic speaker recognition (ASR) has been the subject of extensive research over the past decades, the lack of robustness against... |
SourceID | pascalfrancis crossref ieee |
SourceType | Index Database Enrichment Source Publisher |
StartPage | 108 |
SubjectTerms | Adaptation model Applied sciences Automatic speaker recognition (ASR) Data models Estimation Exact sciences and technology Information, signal and communications theory mask estimation Materials mel frequency cepstral coefficient (MFCC) Miscellaneous missing data noise robustness Signal processing Speaker recognition Speech Speech processing Speech recognition Telecommunications and information theory universal background model (UBM) |
Title | Noise-Robust Speaker Recognition Combining Missing Data Techniques and Universal Background Modeling |
URI | https://ieeexplore.ieee.org/document/5782936 |
Volume | 20 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB4BJziUV1G3BeQDJ0SWJI5j-0gLCCGWAywSt8h2xlK1VYLY7IVf33Eeqy1CFadEiR1b-fwY2zPfB3BiYyylECLSUpgo005FxkgfpbzMfKotjYAh3nlyn988ZbfP4nkNzpaxMIjYOp_hONy2Z_ll7RZhq-w8UK9rnq_DOi3culitYdTNeJ513KhCBQrGvD_BTGJ9Pr14vOvIOml-Uzz4Hq7MQa2oSnCJNHP6K76Ts1iZY663YTLUrnMtmY0XjR27t3fEjZ-t_g586Y1NdtG1jl1Yw2oPtlYoCPehvK9_zzF6qO1i3rDHFzQzfGUPg1tRXTEaMWyrIsEmBFK4XprGsOlA_jpnpipZ7-BBpf00bhZiRehhEFoL4e5f4en6avrrJuqVFyKXatFERrokTVBmAmOlE624L3WSukym3FE3L4Xkjnp_bI00XjjhpSV0yZwh69Gi5AewUdUVfgNmbY6eFonKkbEWK2e1ymNXCjJbLHr0I4gHLArX05IHdYw_Rbs8iXUR4CsCfEUP3whOl1leOk6O_yXeD0gsE_YgjOD4H8CX79Mgxpxx9f3jfD9gk76edpswh7DRvC7wiMySxh637fEvgD3e7A |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcgAOvApieRQfOCGyTew4to_lUS2wu4d2K_UW2c5YqhYlVTd74dczzmO1IIQ4JUps5fHZnrE9830A71yKlZJSJkZJm-TG68RaFRIuqjxw42gEjPnOi2Uxu8y_XcmrA_iwy4VBxC74DKfxtNvLrxq_jUtlJ5F63YjiDtwluy95n601jru5KPKeHVXqSMJYDHuYWWpOVqcX856ukyycFjH6cM8KdbIqMSjSbui_hF7QYs_KnD2Cxfh-fXDJerpt3dT__IO68X8_4DE8HNxNdtq3jydwgPVTeLBHQngE1bK53mBy3rjtpmUXN2jXeMvOx8CipmY0ZrhOR4ItCKZ4_Gxby1Yj_euG2bpiQ4gHPe2j9euYLUIXo9RaTHh_BpdnX1afZsmgvZB4bmSbWOUznqHKJabaZEaLUJmM-1xx4amjV1IJT_0_dVbZIL0MyhG-5NCQ_-hQiedwWDc1vgDmXIGBponak7uWau-MLlJfSXJcHAYME0hHLEo_EJNHfYwfZTdBSU0Z4SsjfOUA3wTe76rc9Kwc_yp8FJHYFRxAmMDxb4Dv7vMox5wL_fLv9d7CvdlqMS_nX5ffX8F9ehLvl2Rew2F7u8U35KS07rhrm78AS_ziNg |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Noise-Robust+Speaker+Recognition+Combining+Missing+Data+Techniques+and+Universal+Background+Modeling&rft.jtitle=IEEE+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=MAY%2C+Tobias&rft.au=DE+PAR%2C+Steven+Van&rft.au=KOHLRAUSCH%2C+Armin&rft.date=2012&rft.pub=Institute+of+Electrical+and+Electronics+Engineers&rft.issn=1558-7916&rft.volume=20&rft.issue=1&rft.spage=108&rft.epage=121&rft_id=info:doi/10.1109%2FTASL.2011.2158309&rft.externalDBID=n%2Fa&rft.externalDocID=25473438 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1558-7916&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1558-7916&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1558-7916&client=summon |