An analysis of environment, microphone and data simulation mismatches in robust speech recognition

•An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data simulation mismatches.•Based on: a critical analysis of the results published on the CHiME-3 dataset and new experiments.•Result: with the ex...

Full description

Saved in:

Bibliographic Details
Published in	Computer speech & language Vol. 46; pp. 535 - 557
Main Authors	Vincent, Emmanuel, Watanabe, Shinji, Nugraha, Aditya Arie, Barker, Jon, Marxer, Ricard
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.11.2017 Elsevier
Subjects	Computer Science Microphone array Robust ASR Signal and Image Processing Speech enhancement Train/test mismatch Speech enhancement Microphone array Train/test mismatch Robust ASR speech enhancement microphone array train/test mismatch
Online Access	Get full text

Cover

Loading…

Abstract	•An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data simulation mismatches.•Based on: a critical analysis of the results published on the CHiME-3 dataset and new experiments.•Result: with the exception of MVDR beamforming, these mismatches have little effect on the ASR performance.•Contribution: the CHiME-4 challenge, which revisits the CHiME-3 dataset and reduces the number of microphones available for testing. Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge, which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing.
AbstractList	Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge , which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing. •An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data simulation mismatches.•Based on: a critical analysis of the results published on the CHiME-3 dataset and new experiments.•Result: with the exception of MVDR beamforming, these mismatches have little effect on the ASR performance.•Contribution: the CHiME-4 challenge, which revisits the CHiME-3 dataset and reduces the number of microphones available for testing. Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge, which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing.
Author	Vincent, Emmanuel Barker, Jon Watanabe, Shinji Nugraha, Aditya Arie Marxer, Ricard
Author_xml	– sequence: 1 givenname: Emmanuel surname: Vincent fullname: Vincent, Emmanuel email: emmanuel.vincent@inria.fr organization: Inria, 54600 Villers-lès-Nancy, France – sequence: 2 givenname: Shinji surname: Watanabe fullname: Watanabe, Shinji organization: Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, USA – sequence: 3 givenname: Aditya Arie surname: Nugraha fullname: Nugraha, Aditya Arie organization: Inria, 54600 Villers-lès-Nancy, France – sequence: 4 givenname: Jon surname: Barker fullname: Barker, Jon organization: Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK – sequence: 5 givenname: Ricard orcidid: 0000-0001-5099-5059 surname: Marxer fullname: Marxer, Ricard organization: Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK
BackLink	https://inria.hal.science/hal-01399180$$DView record in HAL
BookMark	eNp9kE1LAzEQhoMoWKs_wFuugrvOdDfbLJ6K-AUFL3oO02zWpuwmJYkF_72p1YsHTwmZ98nwPmfs2HlnGLtEKBGwudmUOg7lLF9LxBJAHLEJQisKWTXVMZuAlKKYVSBP2VmMGwBoRD2fsNXCcXI0fEYbue-5cTsbvBuNS9d8tDr47TpvypmOd5SIRzt-DJSsd3kcR0p6bSK3jge_-oiJx60xes2D0f7d2X3unJ30NERz8XNO2dvD_evdU7F8eXy-WywLXdeQCmp1Q30vDdWVBgQpBHZUoZQtktTzFVUkyKyauu-1blDOYdYJ0YHuyPQaqim7Ovy7pkFtgx0pfCpPVj0tlmr_Bli1LUrYYc7OD9lcMMZgeqVt-m6VAtlBIai9VrVRWavaa1WIKmvNJP4hf1f9x9weGJPr76wJKmprnDadzZ6S6rz9h_4CRGOUbQ
CitedBy_id	crossref_primary_10_1016_j_iswa_2023_200288 crossref_primary_10_1109_TASLP_2024_3426924 crossref_primary_10_1007_s12652_021_03216_7 crossref_primary_10_1109_TASLP_2019_2959721 crossref_primary_10_1145_3310132 crossref_primary_10_7210_jrsj_42_920 crossref_primary_10_3390_s23010111 crossref_primary_10_1016_j_apacoust_2024_110407 crossref_primary_10_1103_PhysRevE_106_035303 crossref_primary_10_1109_TASLP_2024_3352249 crossref_primary_10_1016_j_dsp_2018_11_005 crossref_primary_10_1109_TASLP_2020_3036776 crossref_primary_10_1109_TASLP_2019_2940662 crossref_primary_10_1109_LSP_2021_3099715 crossref_primary_10_1109_TASLP_2018_2881912 crossref_primary_10_3390_s24206644 crossref_primary_10_1371_journal_pone_0212342 crossref_primary_10_1109_MSP_2024_3451653 crossref_primary_10_1109_TASLP_2020_2980372 crossref_primary_10_1109_TASLP_2018_2876169 crossref_primary_10_2139_ssrn_4162355 crossref_primary_10_1109_TASLP_2018_2870742 crossref_primary_10_1109_LSP_2021_3056279 crossref_primary_10_1109_JSTSP_2017_2763455 crossref_primary_10_1155_2022_9722209 crossref_primary_10_1177_23312165241292205 crossref_primary_10_1109_TASLP_2022_3145319 crossref_primary_10_1016_j_neucom_2018_01_013 crossref_primary_10_1109_JSTSP_2019_2909193 crossref_primary_10_3390_app13084926 crossref_primary_10_3390_app10030769 crossref_primary_10_3390_biomimetics5010001 crossref_primary_10_3233_JIFS_189469 crossref_primary_10_1016_j_csl_2024_101751 crossref_primary_10_1109_JSTSP_2017_2764276 crossref_primary_10_1109_JSTSP_2017_2752691 crossref_primary_10_1007_s11082_023_05926_y crossref_primary_10_1049_iet_spr_2019_0304 crossref_primary_10_1109_TASLP_2024_3407533 crossref_primary_10_1016_j_ijleo_2022_168762 crossref_primary_10_1109_TASLP_2020_3036783 crossref_primary_10_1155_2022_3900336 crossref_primary_10_1016_j_neunet_2022_01_003 crossref_primary_10_1109_TASLP_2020_2998279 crossref_primary_10_1155_2021_6783205 crossref_primary_10_1109_TASLP_2021_3092585 crossref_primary_10_1109_ACCESS_2018_2871713 crossref_primary_10_3233_JIFS_219147 crossref_primary_10_1016_j_specom_2021_01_002 crossref_primary_10_1109_TASLP_2023_3328282 crossref_primary_10_1155_2022_2910859 crossref_primary_10_1016_j_neucom_2023_127015 crossref_primary_10_1109_ACCESS_2018_2882055 crossref_primary_10_1109_LSP_2020_3039944 crossref_primary_10_3233_JIFS_189521 crossref_primary_10_1016_j_csl_2020_101155 crossref_primary_10_3390_electronics9071157 crossref_primary_10_3389_frobt_2018_00010 crossref_primary_10_1109_LSP_2018_2880285 crossref_primary_10_1109_LSP_2020_3025410 crossref_primary_10_1109_TASLP_2022_3172632 crossref_primary_10_1155_2022_3192892 crossref_primary_10_1109_LSP_2023_3289110 crossref_primary_10_1007_s11277_021_08773_w crossref_primary_10_1016_j_ecolind_2020_106559 crossref_primary_10_1109_OJSP_2020_3045349 crossref_primary_10_1109_JPROC_2020_3018668 crossref_primary_10_1109_LSP_2024_3505794 crossref_primary_10_1016_j_procs_2020_12_020 crossref_primary_10_1016_j_csl_2016_10_005 crossref_primary_10_1002_tee_22868 crossref_primary_10_1016_j_specom_2023_102958 crossref_primary_10_1109_JSTSP_2019_2923372 crossref_primary_10_1109_ACCESS_2024_3427778 crossref_primary_10_1109_TASLP_2021_3067202 crossref_primary_10_1155_2022_1948159 crossref_primary_10_1109_TASLP_2024_3350887 crossref_primary_10_1109_LSP_2019_2932848 crossref_primary_10_1109_MSP_2019_2918706 crossref_primary_10_1145_3567734 crossref_primary_10_1109_TASLP_2020_2996503 crossref_primary_10_1109_TASLP_2019_2907015 crossref_primary_10_1109_TASLP_2022_3190739 crossref_primary_10_1016_j_measurement_2024_115722 crossref_primary_10_1016_j_dsp_2017_12_011 crossref_primary_10_3390_app9214639 crossref_primary_10_1109_TASLP_2020_2979603 crossref_primary_10_1016_j_engappai_2023_107807 crossref_primary_10_1016_j_dcan_2022_04_035 crossref_primary_10_1155_2022_9033421 crossref_primary_10_1186_s13636_024_00387_x crossref_primary_10_1186_s13636_024_00382_2 crossref_primary_10_1007_s12652_020_02598_4 crossref_primary_10_1109_TSP_2021_3068626 crossref_primary_10_3390_electronics8080897 crossref_primary_10_1186_s13636_021_00231_6 crossref_primary_10_7735_ksmte_2024_33_1_27 crossref_primary_10_1109_ACCESS_2023_3328208 crossref_primary_10_3233_JIFS_189796
Cites_doi	10.1109/TASL.2009.2029711 10.1109/78.934132 10.1016/j.csl.2012.07.008 10.1109/TASLP.2015.2473684 10.1007/BF02999432 10.1109/LSP.2013.2291240 10.1016/j.specom.2015.09.004 10.1109/TASLP.2014.2352935 10.1109/MSP.2009.932166 10.1109/TASL.2007.902460 10.1109/TASLP.2016.2580946 10.1016/j.csl.2010.12.003 10.1016/j.csl.2012.10.004 10.1016/j.sigpro.2007.01.016 10.1006/csla.1998.0043 10.1109/TASL.2010.2045183 10.1109/TASSP.1987.1165054 10.1155/S1110865703305074 10.1109/TASL.2010.2050716
ContentType	Journal Article
Copyright	2016 Elsevier Ltd Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml	– notice: 2016 Elsevier Ltd – notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID	AAYXX CITATION 1XC VOOES
DOI	10.1016/j.csl.2016.11.005
DatabaseName	CrossRef Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1095-8363
EndPage	557
ExternalDocumentID	oai_HAL_hal_01399180v1 10_1016_j_csl_2016_11_005 S0885230816301231
GroupedDBID	--K --M .DC .~1 0R~ 1B1 1RT 1~. 1~5 29F 4.4 457 4G. 5GY 5VS 6J9 7-5 71M 8P~ 9JN 9JO AACTN AADFP AAEDT AAEDW AAFJI AAGJA AAGUQ AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABMMH ABOYX ABTAH ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACXNI ACZNC ADBBV ADEZE ADFGL ADJOM ADMUD ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AFYLN AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV AKYCK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOMHK AOUOD ASPBG AVARZ AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CAG COF CS3 DM4 DU5 EBS EFBJH EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HMW HMY HVGLF HZ~ IHE J1W JJJVA KOM LG5 LX9 M3U M3X M41 MO0 MVM N9A O-L O9- OAUVE OKEIE OZT P-8 P-9 P2P PC. PRBVW Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SPS SSB SSO SSS SST SSV SSY SSZ T5K TN5 UHS WUQ XFK XPP YK3 ZMT ZY4 ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACRPL ACVFH ADCNI ADMHG ADNMO AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH 1XC VOOES
ID	FETCH-LOGICAL-c440t-a9c6aff8ea43c0108551da318891a8c7ba3a5aeb64ffcc618702d55d0cdaefc03
IEDL.DBID	.~1
ISSN	0885-2308
IngestDate	Sat Jun 07 06:28:09 EDT 2025 Tue Jul 01 00:18:33 EDT 2025 Thu Apr 24 23:01:45 EDT 2025 Fri Feb 23 02:29:30 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	Speech enhancement Microphone array Train/test mismatch Robust ASR speech enhancement microphone array train/test mismatch
Language	English
License	Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c440t-a9c6aff8ea43c0108551da318891a8c7ba3a5aeb64ffcc618702d55d0cdaefc03
ORCID	0000-0001-5099-5059 0000-0002-0183-7289
OpenAccessLink	https://inria.hal.science/hal-01399180
PageCount	23
ParticipantIDs	hal_primary_oai_HAL_hal_01399180v1 crossref_citationtrail_10_1016_j_csl_2016_11_005 crossref_primary_10_1016_j_csl_2016_11_005 elsevier_sciencedirect_doi_10_1016_j_csl_2016_11_005
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2017-11-01
PublicationDateYYYYMMDD	2017-11-01
PublicationDate_xml	– month: 11 year: 2017 text: 2017-11-01 day: 01
PublicationDecade	2010
PublicationTitle	Computer speech & language
PublicationYear	2017
Publisher	Elsevier Ltd Elsevier
Publisher_xml	– name: Elsevier Ltd – name: Elsevier
References	Zelinski (bib0075) 1988; vol. 5 Mikolov, Karafiát, Burget, Cernocký, Khudanpur (bib0045) 2010 Garofalo, Graff, Paul, Pallett (bib0023) 2007 Fujita, Takashima, Homma, Ikeshita, Kawaguchi, Sumiyoshi, Endo, Togami (bib0020) 2015 Fox, Liu, Zwyssig, Hain (bib0019) 2013 Karafiát, Burget, Matějka, Glembek, Černocký (bib0032) 2011 Kim, Smaragdis (bib0034) 2015 Kneser, Ney (bib0036) 1995; vol.1 Shinoda, K., 2011. Speaker adaptation techniques for automatic speech recognition. Proceedings of the APSIPA ASC 2011. Stupakov, Hanusa, Vijaywargi, Fox, Bilmes (bib0062) 2011; 26 Barker, Marxer, Vincent, Watanabe (bib0008) 2016 Wölfel, McDonough (bib0071) 2009 Seltzer, Yu, Wang (bib0057) 2013 Vu, Bigot, Chng (bib0067) 2015 Anderson, Teal, Poletti (bib0001) 2015; 23 Chen, Wang, Wang (bib0012) 2015 Renals, Hain, Bourlard (bib0055) 2008 Swietojanski, Renals (bib0063) 2014 Barker, Marxer, Vincent, Watanabe (bib0007) 2015 Bell, Gales, Hain, Kilgour, Lanchantin, Liu, McParland, Renals, Saz, Wester, Woodland (bib0010) 2015 Wang, X., Wu, C., Zhang, P., Wang, Z., Liu, Y., Li, X., Fu, Q., Yan, Y., 2015. Noise robust IOA/CAS speech separation and recognition system for the third ’CHIME’ challenge. ArXiv Ravanelli, Cristoforetti, Gretter, Pellin, Sosi, Omologo (bib0054) 2015 Moritz, Gerlach, Adiloglu, Anemüller, Kollmeier, Goetze (bib0048) 2015 . Lin, M., Chen, Q., Yan, S., 2014. Network in network. ArXiv Mandel, Weiss, Ellis (bib0042) 2010; 18 Sivasankaran, Nugraha, Vincent, Morales-Cordovilla, Dalmia, Illina (bib0060) 2015 (bib0013) 2010 Weninger, Erdogan, Watanabe, Vincent, Le Roux, Hershey, Schuller (bib0070) 2015 Barfuss, H., Huemmer, C., Schwarz, A., Kellermann, W., 2015. Robust coherence-based spectral enhancement for distant speech recognition. ArXiv Gales (bib0021) 1998; 12 Doclo, Moonen (bib0016) 2007; 15 Nugraha, Liutkus, Vincent (bib0049) 2016; 24 Simmer, Fischer, Wasiljeff (bib0059) 1994; 7/8 Mitra, Franco, Graciarena (bib0046) 2013 Karanasou, Wang, Gales, Woodland (bib0033) 2014 Yoshioka, Nakatani, Miyoshi, Okuno (bib0074) 2010; 19 Mestre, Lagunas (bib0044) 2003 Martinez, Meyer (bib0043) 2015 Yoshioka, Ito, Delcroix, Ogawa, Kinoshita, Fujimoto, Yu, Fabian, Espi, Higuchi, Araki, Nakatani (bib0073) 2015 Hirsch, Pearce (bib0028) 2000 Nugraha, Liutkus, Vincent (bib0050) 2016 Vincent, Gribonval, Plumbley (bib0065) 2007; 87 DiBiase, Silverman, Brandstein (bib0015) 2001 Gillick, Cox (bib0024) 1989 Povey, Ghoshal, Boulianne, Burget, Glembek, Goel, Hannemann, Motlicek, Qian, Schwarz, Silovsky, Stemmer, Vesely (bib0052) 2011 Harper (bib0026) 2015 Gannot, Burshtein, Weinstein (bib0022) 2001; 49 Xu, Du, Dai, Lee (bib0072) 2014; 21 Baker, Deng, Glass, Khudanpur, Lee, Morgan, O’Shaughnessy (bib0005) 2009; 26 Brutti, Matassoni (bib0011) 2016; 76 Kanda, Takeda, Obuchi (bib0031) 2013 Barker, Vincent, Ma, Christensen, Green (bib0009) 2013; 27 Araki, Makino, Hinamoto, Mukai, Nishikawa, Saruwatari (bib0003) 2003; 11 Bagchi, Mandel, Wang, He, Plummer, Fosler-Lussier (bib0004) 2015 Anguera, Wooters, Hernando (bib0002) 2007; 15 Hori, Chen, Erdogan, Hershey, Roux, Mitra, Watanabe (bib0029) 2015 Fiscus (bib0018) 1997 Hurmalainen, Gemmeke, Virtanen (bib0030) 2013; 27 Tachioka, Kanagawa, Ishii (bib0064) 2015 Heymann, Drude, Chinaev, Haeb-Umbach (bib0027) 2015 Cox, Zeskind, Owen (bib0014) 1987; 35 Mitra, Franco, Graciarena, Vergyri (bib0047) 2014 Lamel, Schiel, Fourcin, Mariani, Tillman (bib0038) 1994 Pang, Z., Zhu, F., 2015. Noise-robust ASR for the third ’CHiME’ challenge exploiting time-frequency masking based multi-channel speech enhancement and recurrent neural network. ArXiv Prudnikov, Korenevsky, Aleinik (bib0053) 2015 Liutkus, Fitzgerald, Rafii (bib0041) 2015 Duong, Vincent, Gribonval (bib0017) 2010; 18 Kumatani, Arakawa, Yamamoto, McDonough, Raj, Singh, Tashev (bib0037) 2012 Schwarz, Kellermann (bib0056) 2014 Stolbov, Aleinik (bib0061) 2015; 9 Wang, Narayanan, Wang (bib0069) 2014; 22 Kinoshita, Delcroix, Yoshioka, Nakatani, Habets, Haeb-Umbach, Leutnant, Sehr, Kellermann, Maas, Gannot, Raj (bib0035) 2013 Li, Deng, Haeb-Umbach, Gong (bib0039) 2015 Hansen, Angkititrakul, Plucienkowski, Gallant, Yapanel (bib0025) 2001 (bib0066) 2012 Doclo (10.1016/j.csl.2016.11.005_bib0016) 2007; 15 Hurmalainen (10.1016/j.csl.2016.11.005_bib0030) 2013; 27 Li (10.1016/j.csl.2016.11.005_bib0039) 2015 Gales (10.1016/j.csl.2016.11.005_bib0021) 1998; 12 Swietojanski (10.1016/j.csl.2016.11.005_bib0063) 2014 DiBiase (10.1016/j.csl.2016.11.005_bib0015) 2001 Bell (10.1016/j.csl.2016.11.005_bib0010) 2015 Simmer (10.1016/j.csl.2016.11.005_bib0059) 1994; 7/8 Kanda (10.1016/j.csl.2016.11.005_bib0031) 2013 Stolbov (10.1016/j.csl.2016.11.005_bib0061) 2015; 9 Kim (10.1016/j.csl.2016.11.005_bib0034) 2015 Wölfel (10.1016/j.csl.2016.11.005_bib0071) 2009 Araki (10.1016/j.csl.2016.11.005_bib0003) 2003; 11 10.1016/j.csl.2016.11.005_bib0040 Schwarz (10.1016/j.csl.2016.11.005_bib0056) 2014 Barker (10.1016/j.csl.2016.11.005_bib0009) 2013; 27 (10.1016/j.csl.2016.11.005_bib0013) 2010 Gillick (10.1016/j.csl.2016.11.005_bib0024) 1989 Mandel (10.1016/j.csl.2016.11.005_bib0042) 2010; 18 Mitra (10.1016/j.csl.2016.11.005_bib0047) 2014 Lamel (10.1016/j.csl.2016.11.005_bib0038) 1994 Barker (10.1016/j.csl.2016.11.005_sbref0007) 2016 Fiscus (10.1016/j.csl.2016.11.005_bib0018) 1997 Kinoshita (10.1016/j.csl.2016.11.005_bib0035) 2013 Heymann (10.1016/j.csl.2016.11.005_bib0027) 2015 Fujita (10.1016/j.csl.2016.11.005_bib0020) 2015 Zelinski (10.1016/j.csl.2016.11.005_bib0075) 1988; vol. 5 Weninger (10.1016/j.csl.2016.11.005_bib0070) 2015 Xu (10.1016/j.csl.2016.11.005_bib0072) 2014; 21 Karafiát (10.1016/j.csl.2016.11.005_bib0032) 2011 Harper (10.1016/j.csl.2016.11.005_bib0026) 2015 Anderson (10.1016/j.csl.2016.11.005_bib0001) 2015; 23 Duong (10.1016/j.csl.2016.11.005_bib0017) 2010; 18 10.1016/j.csl.2016.11.005_bib0051 Hansen (10.1016/j.csl.2016.11.005_bib0025) 2001 Garofalo (10.1016/j.csl.2016.11.005_bib0023) 2007 10.1016/j.csl.2016.11.005_bib0058 Tachioka (10.1016/j.csl.2016.11.005_bib0064) 2015 Yoshioka (10.1016/j.csl.2016.11.005_bib0074) 2010; 19 10.1016/j.csl.2016.11.005_bib0006 Baker (10.1016/j.csl.2016.11.005_bib0005) 2009; 26 (10.1016/j.csl.2016.11.005_bib0066) 2012 Nugraha (10.1016/j.csl.2016.11.005_bib0049) 2016; 24 Hirsch (10.1016/j.csl.2016.11.005_bib0028) 2000 Sivasankaran (10.1016/j.csl.2016.11.005_bib0060) 2015 Anguera (10.1016/j.csl.2016.11.005_bib0002) 2007; 15 Nugraha (10.1016/j.csl.2016.11.005_bib0050) 2016 Seltzer (10.1016/j.csl.2016.11.005_bib0057) 2013 Cox (10.1016/j.csl.2016.11.005_bib0014) 1987; 35 Martinez (10.1016/j.csl.2016.11.005_sbref0041) 2015 Vincent (10.1016/j.csl.2016.11.005_sbref0061) 2007; 87 Barker (10.1016/j.csl.2016.11.005_bib0007) 2015 Renals (10.1016/j.csl.2016.11.005_bib0055) 2008 Kumatani (10.1016/j.csl.2016.11.005_bib0037) 2012 Mestre (10.1016/j.csl.2016.11.005_bib0044) 2003 Fox (10.1016/j.csl.2016.11.005_bib0019) 2013 Mikolov (10.1016/j.csl.2016.11.005_bib0045) 2010 Prudnikov (10.1016/j.csl.2016.11.005_bib0053) 2015 Chen (10.1016/j.csl.2016.11.005_bib0012) 2015 Karanasou (10.1016/j.csl.2016.11.005_bib0033) 2014 Vu (10.1016/j.csl.2016.11.005_bib0067) 2015 10.1016/j.csl.2016.11.005_bib0068 Povey (10.1016/j.csl.2016.11.005_bib0052) 2011 Brutti (10.1016/j.csl.2016.11.005_bib0011) 2016; 76 Wang (10.1016/j.csl.2016.11.005_bib0069) 2014; 22 Liutkus (10.1016/j.csl.2016.11.005_bib0041) 2015 Ravanelli (10.1016/j.csl.2016.11.005_bib0054) 2015 Stupakov (10.1016/j.csl.2016.11.005_bib0062) 2011; 26 Yoshioka (10.1016/j.csl.2016.11.005_bib0073) 2015 Mitra (10.1016/j.csl.2016.11.005_bib0046) 2013 Kneser (10.1016/j.csl.2016.11.005_bib0036) 1995; vol.1 Bagchi (10.1016/j.csl.2016.11.005_bib0004) 2015 Hori (10.1016/j.csl.2016.11.005_bib0029) 2015 Moritz (10.1016/j.csl.2016.11.005_bib0048) 2015 Gannot (10.1016/j.csl.2016.11.005_bib0022) 2001; 49
References_xml	– start-page: 416 year: 2015 end-page: 422 ident: bib0020 article-title: Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: vol. 5 start-page: 2578 year: 1988 end-page: 2581 ident: bib0075 article-title: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms publication-title: Proceedings of the 1988 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – start-page: 687 year: 2015 end-page: 693 ident: bib0010 article-title: The MGB challenge: Evaluating multi-genre broadcast media recognition publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 15 start-page: 617 year: 2007 end-page: 631 ident: bib0016 article-title: Superdirective beamforming robust against microphone mismatch publication-title: IEEE Trans. Acoust. Speech Signal Process. – volume: 22 start-page: 1849 year: 2014 end-page: 1858 ident: bib0069 article-title: On training targets for supervised speech separation publication-title: IEEE/ACM Trans. Audio, Speech Lang. Process. – volume: 18 start-page: 382 year: 2010 end-page: 394 ident: bib0042 article-title: Model-based expectation maximization source separation and localization publication-title: IEEE Trans. Audio Speech Lang. Process. – reference: Lin, M., Chen, Q., Yan, S., 2014. Network in network. ArXiv: – start-page: 1749 year: 2014 end-page: 1753 ident: bib0047 article-title: Medium duration modulation cepstral feature for robust speech recognition publication-title: Proceedings of the 2014 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) – year: 2012 ident: bib0066 publication-title: Techniques for Noise Robustness in Automatic Speech Recognition – start-page: 482 year: 2015 end-page: 489 ident: bib0060 article-title: Robust ASR using neural network based speech enhancement and feature simulation publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 26 start-page: 52 year: 2011 end-page: 66 ident: bib0062 article-title: The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments publication-title: Comput. Speech Lang. – start-page: 423 year: 2015 end-page: 429 ident: bib0067 article-title: Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge publication-title: Proceedings of the 2015IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 21 start-page: 65 year: 2014 end-page: 68 ident: bib0072 article-title: An experimental study on speech enhancement based on deep neural networks publication-title: IEEE Signal Process. Lett. – start-page: 171 year: 2014 end-page: 176 ident: bib0063 article-title: Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models publication-title: Proceedings of the 2014 IEEE Spoken Language Technology Workshop (SLT) – volume: vol.1 start-page: 181 year: 1995 end-page: 184 ident: bib0036 article-title: Improved backing-off for m-gram language modeling publication-title: Proceedings of the 1995 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) – start-page: 1 year: 2012 end-page: 10 ident: bib0037 article-title: Microphone array processing for distant speech recognition: Towards real-world deployment publication-title: Proceedings of the APSIPA Annual Summit and Conf. – start-page: 504 year: 2015 end-page: 511 ident: bib0007 article-title: The third ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 11 start-page: 1157 year: 2003 end-page: 1166 ident: bib0003 article-title: Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures publication-title: EURASIP J. Appl. Signal Process. – year: 2007 ident: bib0023 publication-title: CSR-I (WSJ0) Complete – volume: 24 start-page: 1652 year: 2016 end-page: 1664 ident: bib0049 article-title: Multichannel audio source separation with deep neural networks publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – volume: 23 start-page: 2189 year: 2015 end-page: 2197 ident: bib0001 article-title: Spatially robust far-field beamforming using the von Mises(-Fisher) distribution publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 76 year: 2015 end-page: 80 ident: bib0041 article-title: Scalable audio separation with light kernel additive modelling publication-title: Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) – volume: 87 year: 2007 ident: bib0065 article-title: Oracle estimators for the benchmarking of source separation algorithms publication-title: Signal Process. – start-page: 7398 year: 2013 end-page: 7402 ident: bib0057 article-title: An investigation of deep neural networks for noise robust speech recognition publication-title: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – start-page: 152 year: 2011 end-page: 157 ident: bib0032 article-title: ivector-based discriminative adaptation for automatic speech recognition publication-title: Proceedings of the 2011 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – year: 2015 ident: bib0039 publication-title: Robust Automatic Speech Recognition – A Bridge to Practical Applications – volume: 9 start-page: 310 year: 2015 end-page: 319 ident: bib0061 article-title: Improvement of microphone array characteristics for speech capturing publication-title: Mod. Appl. Sci. – start-page: 532 year: 1989 end-page: 535 ident: bib0024 article-title: Some statistical issues in the comparison of speech recognition algorithms publication-title: Proceedings of the 1989 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) – start-page: 91 year: 2015 end-page: 99 ident: bib0070 article-title: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR publication-title: Proceedings of the 12th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA) – start-page: 2023 year: 2001 end-page: 2026 ident: bib0025 article-title: “CU-Move”: analysis & corpus development for interactive in-vehicle speech systems publication-title: Proceedings of Eurospeech – start-page: 6 year: 2014 end-page: 10 ident: bib0056 article-title: Unbiased coherent-to-diffuse ratio estimation for dereverberation publication-title: Proceedings of the 2014 International Workshop on Acoustic Signal Enhancement (IWAENC) – start-page: 1 year: 2013 end-page: 4 ident: bib0035 article-title: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech publication-title: Proceedings of the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) – start-page: 886 year: 2013 end-page: 890 ident: bib0046 article-title: Damped oscillator cepstral coefficients for robust speech recognition publication-title: Proceedings of the Interspeech – year: 2015 ident: bib0064 article-title: The Overview of the MELCO ASR System for the Third CHiME Challenge publication-title: Technical Report SVAN154551 – volume: 18 start-page: 1830 year: 2010 end-page: 1840 ident: bib0017 article-title: Under-determined reverberant audio source separation using a full-rank spatial covariance model publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – volume: 27 start-page: 763 year: 2013 end-page: 779 ident: bib0030 article-title: Modelling non-stationary noise with spectral factorisation in automatic speech recognition publication-title: Comput. Speech Lang. – start-page: 401 year: 2015 end-page: 408 ident: bib0053 article-title: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – reference: Barfuss, H., Huemmer, C., Schwarz, A., Kellermann, W., 2015. Robust coherence-based spectral enhancement for distant speech recognition. ArXiv: – year: 2009 ident: bib0071 publication-title: Distant Speech Recognition – start-page: 275 year: 2015 end-page: 282 ident: bib0054 article-title: The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – start-page: 181 year: 2000 end-page: 188 ident: bib0028 article-title: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions publication-title: Proceedings of the ASR2000 – volume: 12 start-page: 75 year: 1998 end-page: 98 ident: bib0021 article-title: Maximum likelihood linear transformations for HMM-based speech recognition publication-title: Comput. Speech Lang. – start-page: 459 year: 2003 end-page: 462 ident: bib0044 article-title: On diagonal loading for minimum variance beamformers publication-title: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) – year: 2016 ident: bib0050 article-title: Multichannel music separation with deep neural networks publication-title: Proceedings of the EUSIPCO – volume: 35 start-page: 1365 year: 1987 end-page: 1376 ident: bib0014 article-title: Robust adaptive beamforming publication-title: IEEE Trans. Acoust. Speech Signal Process. – reference: Pang, Z., Zhu, F., 2015. Noise-robust ASR for the third ’CHiME’ challenge exploiting time-frequency masking based multi-channel speech enhancement and recurrent neural network. ArXiv: – volume: 76 start-page: 170 year: 2016 end-page: 185 ident: bib0011 article-title: On the relationship between early-to-late ratio of room impulse responses and ASR performance in reverberant environments publication-title: Speech Commun. – start-page: 547 year: 2015 end-page: 554 ident: bib0026 article-title: The automatic speech recognition in reverberant environments (ASpIRE) challenge publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – year: 2016 ident: bib0008 article-title: The third ‘CHiME’ speech separation and recognition challenge: analysis and outcomes publication-title: Comput. Speech Lang – start-page: 2180 year: 2014 end-page: 2184 ident: bib0033 article-title: Adaptation of deep neural network acoustic models using factorised i-vectors publication-title: Proceedings of the Interspeech – volume: 27 start-page: 621 year: 2013 end-page: 633 ident: bib0009 article-title: The PASCAL CHiME speech separation and recognition challenge publication-title: Comput. Speech Lang. – start-page: 309 year: 2013 end-page: 314 ident: bib0031 article-title: Elastic spectral distortion for low resource speech recognition with deep neural networks publication-title: Proceedings of the 2013 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – start-page: 115 year: 2008 end-page: 118 ident: bib0055 article-title: Interpretation of multiparty meetings: the AMI and AMIDA projects publication-title: Proceedings of the 2nd Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA) – start-page: 436 year: 2015 end-page: 443 ident: bib0073 article-title: The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – year: 2010 ident: bib0013 publication-title: Speech Processing in Modern Communication: Challenges and Perspectives – start-page: 1116 year: 2013 end-page: 1120 ident: bib0019 article-title: The Sheffield wargames corpus publication-title: Proceedings of Interspeech – start-page: 475 year: 2015 end-page: 481 ident: bib0029 article-title: The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 15 start-page: 2011 year: 2007 end-page: 2023 ident: bib0002 article-title: Acoustic beamforming for speaker diarization of meetings publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 496 year: 2015 end-page: 503 ident: bib0004 article-title: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 26 start-page: 75 year: 2009 end-page: 80 ident: bib0005 article-title: Research developments and directions in speech recognition and understanding, part 1 publication-title: IEEE Signal Process. Mag. – year: 2015 ident: bib0043 article-title: Mutual Benefits of Auditory Spectro-Temporal Gabor Features and Deep Learning for the 3rd CHiME Challenge – year: 2011 ident: bib0052 article-title: The kaldi speech recognition toolkit publication-title: Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (ASRU) – reference: . – volume: 19 start-page: 69 year: 2010 end-page: 84 ident: bib0074 article-title: Blind separation and dereverberation of speech mixtures by joint optimization publication-title: IEEE Trans. Audio Speech Lang. Process. – reference: Wang, X., Wu, C., Zhang, P., Wang, Z., Liu, Y., Li, X., Fu, Q., Yan, Y., 2015. Noise robust IOA/CAS speech separation and recognition system for the third ’CHIME’ challenge. ArXiv: – start-page: 1045 year: 2010 end-page: 1048 ident: bib0045 article-title: Recurrent neural network based language model publication-title: Proceedings of the Interspeech – year: 1994 ident: bib0038 article-title: The translingual English database (TED) publication-title: Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP) – start-page: 100 year: 2015 end-page: 107 ident: bib0034 article-title: Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures publication-title: Proceedings of the 12th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA) – reference: Shinoda, K., 2011. Speaker adaptation techniques for automatic speech recognition. Proceedings of the APSIPA ASC 2011. – start-page: 347 year: 1997 end-page: 354 ident: bib0018 article-title: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER) publication-title: Proceedings of the 1997 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 49 start-page: 1614 year: 2001 end-page: 1626 ident: bib0022 article-title: Signal enhancement using beamforming and nonstationarity with applications to speech publication-title: IEEE Trans. Signal Process. – start-page: 157 year: 2001 end-page: 180 ident: bib0015 article-title: Robust localization in reverberant rooms publication-title: Microphone Arrays: Signal Processing Techniques and Applications – start-page: 83 year: 2015 end-page: 90 ident: bib0012 article-title: Noise perturbation improves supervised speech separation publication-title: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) – start-page: 468 year: 2015 end-page: 474 ident: bib0048 article-title: A CHiME-3 challenge system: long-term acoustic features for noise robust automatic speech recognition publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – start-page: 444 year: 2015 end-page: 451 ident: bib0027 article-title: BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – volume: 7/8 start-page: 439 year: 1994 end-page: 446 ident: bib0059 article-title: Suppression of coherent and incoherent noise using a microphone array publication-title: Ann. Telecommun. – start-page: 1 year: 2012 ident: 10.1016/j.csl.2016.11.005_bib0037 article-title: Microphone array processing for distant speech recognition: Towards real-world deployment – volume: 18 start-page: 382 issue: 2 year: 2010 ident: 10.1016/j.csl.2016.11.005_bib0042 article-title: Model-based expectation maximization source separation and localization publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2009.2029711 – volume: 49 start-page: 1614 issue: 8 year: 2001 ident: 10.1016/j.csl.2016.11.005_bib0022 article-title: Signal enhancement using beamforming and nonstationarity with applications to speech publication-title: IEEE Trans. Signal Process. doi: 10.1109/78.934132 – start-page: 468 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0048 article-title: A CHiME-3 challenge system: long-term acoustic features for noise robust automatic speech recognition – start-page: 444 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0027 article-title: BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge – start-page: 1 year: 2013 ident: 10.1016/j.csl.2016.11.005_bib0035 article-title: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech – volume: 15 start-page: 617 issue: 2 year: 2007 ident: 10.1016/j.csl.2016.11.005_bib0016 article-title: Superdirective beamforming robust against microphone mismatch publication-title: IEEE Trans. Acoust. Speech Signal Process. – volume: 27 start-page: 763 issue: 3 year: 2013 ident: 10.1016/j.csl.2016.11.005_bib0030 article-title: Modelling non-stationary noise with spectral factorisation in automatic speech recognition publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2012.07.008 – start-page: 459 year: 2003 ident: 10.1016/j.csl.2016.11.005_bib0044 article-title: On diagonal loading for minimum variance beamformers – start-page: 687 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0010 article-title: The MGB challenge: Evaluating multi-genre broadcast media recognition – volume: 23 start-page: 2189 issue: 12 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0001 article-title: Spatially robust far-field beamforming using the von Mises(-Fisher) distribution publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2015.2473684 – start-page: 886 year: 2013 ident: 10.1016/j.csl.2016.11.005_bib0046 article-title: Damped oscillator cepstral coefficients for robust speech recognition – start-page: 171 year: 2014 ident: 10.1016/j.csl.2016.11.005_bib0063 article-title: Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models – year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0039 – start-page: 7398 year: 2013 ident: 10.1016/j.csl.2016.11.005_bib0057 article-title: An investigation of deep neural networks for noise robust speech recognition – volume: 7/8 start-page: 439 year: 1994 ident: 10.1016/j.csl.2016.11.005_bib0059 article-title: Suppression of coherent and incoherent noise using a microphone array publication-title: Ann. Telecommun. doi: 10.1007/BF02999432 – start-page: 152 year: 2011 ident: 10.1016/j.csl.2016.11.005_bib0032 article-title: ivector-based discriminative adaptation for automatic speech recognition – year: 2015 ident: 10.1016/j.csl.2016.11.005_sbref0041 – start-page: 532 year: 1989 ident: 10.1016/j.csl.2016.11.005_bib0024 article-title: Some statistical issues in the comparison of speech recognition algorithms – year: 2016 ident: 10.1016/j.csl.2016.11.005_bib0050 article-title: Multichannel music separation with deep neural networks – year: 2016 ident: 10.1016/j.csl.2016.11.005_sbref0007 article-title: The third ‘CHiME’ speech separation and recognition challenge: analysis and outcomes publication-title: Comput. Speech Lang – start-page: 1045 year: 2010 ident: 10.1016/j.csl.2016.11.005_bib0045 article-title: Recurrent neural network based language model – start-page: 6 year: 2014 ident: 10.1016/j.csl.2016.11.005_bib0056 article-title: Unbiased coherent-to-diffuse ratio estimation for dereverberation – year: 2009 ident: 10.1016/j.csl.2016.11.005_bib0071 – start-page: 423 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0067 article-title: Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge – year: 2007 ident: 10.1016/j.csl.2016.11.005_bib0023 – start-page: 100 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0034 article-title: Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures – volume: 21 start-page: 65 issue: 1 year: 2014 ident: 10.1016/j.csl.2016.11.005_bib0072 article-title: An experimental study on speech enhancement based on deep neural networks publication-title: IEEE Signal Process. Lett. doi: 10.1109/LSP.2013.2291240 – year: 2012 ident: 10.1016/j.csl.2016.11.005_bib0066 – start-page: 496 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0004 article-title: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition – volume: 76 start-page: 170 year: 2016 ident: 10.1016/j.csl.2016.11.005_bib0011 article-title: On the relationship between early-to-late ratio of room impulse responses and ASR performance in reverberant environments publication-title: Speech Commun. doi: 10.1016/j.specom.2015.09.004 – start-page: 181 year: 2000 ident: 10.1016/j.csl.2016.11.005_bib0028 article-title: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions – start-page: 482 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0060 article-title: Robust ASR using neural network based speech enhancement and feature simulation – year: 2011 ident: 10.1016/j.csl.2016.11.005_bib0052 article-title: The kaldi speech recognition toolkit – volume: 22 start-page: 1849 issue: 12 year: 2014 ident: 10.1016/j.csl.2016.11.005_bib0069 article-title: On training targets for supervised speech separation publication-title: IEEE/ACM Trans. Audio, Speech Lang. Process. doi: 10.1109/TASLP.2014.2352935 – start-page: 2180 year: 2014 ident: 10.1016/j.csl.2016.11.005_bib0033 article-title: Adaptation of deep neural network acoustic models using factorised i-vectors – start-page: 1116 year: 2013 ident: 10.1016/j.csl.2016.11.005_bib0019 article-title: The Sheffield wargames corpus – start-page: 436 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0073 article-title: The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices – start-page: 347 year: 1997 ident: 10.1016/j.csl.2016.11.005_bib0018 article-title: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER) – start-page: 475 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0029 article-title: The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition – start-page: 547 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0026 article-title: The automatic speech recognition in reverberant environments (ASpIRE) challenge – volume: vol. 5 start-page: 2578 year: 1988 ident: 10.1016/j.csl.2016.11.005_bib0075 article-title: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms – volume: 26 start-page: 75 issue: 3 year: 2009 ident: 10.1016/j.csl.2016.11.005_bib0005 article-title: Research developments and directions in speech recognition and understanding, part 1 publication-title: IEEE Signal Process. Mag. doi: 10.1109/MSP.2009.932166 – volume: 15 start-page: 2011 issue: 7 year: 2007 ident: 10.1016/j.csl.2016.11.005_bib0002 article-title: Acoustic beamforming for speaker diarization of meetings publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2007.902460 – volume: 24 start-page: 1652 year: 2016 ident: 10.1016/j.csl.2016.11.005_bib0049 article-title: Multichannel audio source separation with deep neural networks publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2016.2580946 – year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0064 article-title: The Overview of the MELCO ASR System for the Third CHiME Challenge – start-page: 504 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0007 article-title: The third ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines – volume: vol.1 start-page: 181 year: 1995 ident: 10.1016/j.csl.2016.11.005_bib0036 article-title: Improved backing-off for m-gram language modeling – volume: 26 start-page: 52 issue: 1 year: 2011 ident: 10.1016/j.csl.2016.11.005_bib0062 article-title: The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2010.12.003 – ident: 10.1016/j.csl.2016.11.005_bib0051 – start-page: 416 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0020 article-title: Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection – start-page: 275 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0054 article-title: The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments – start-page: 309 year: 2013 ident: 10.1016/j.csl.2016.11.005_bib0031 article-title: Elastic spectral distortion for low resource speech recognition with deep neural networks – volume: 9 start-page: 310 issue: 6 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0061 article-title: Improvement of microphone array characteristics for speech capturing publication-title: Mod. Appl. Sci. – ident: 10.1016/j.csl.2016.11.005_bib0068 – start-page: 83 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0012 article-title: Noise perturbation improves supervised speech separation – start-page: 76 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0041 article-title: Scalable audio separation with light kernel additive modelling – year: 2010 ident: 10.1016/j.csl.2016.11.005_bib0013 – volume: 27 start-page: 621 issue: 3 year: 2013 ident: 10.1016/j.csl.2016.11.005_bib0009 article-title: The PASCAL CHiME speech separation and recognition challenge publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2012.10.004 – start-page: 2023 year: 2001 ident: 10.1016/j.csl.2016.11.005_bib0025 article-title: “CU-Move”: analysis & corpus development for interactive in-vehicle speech systems – volume: 87 issue: 8 year: 2007 ident: 10.1016/j.csl.2016.11.005_sbref0061 article-title: Oracle estimators for the benchmarking of source separation algorithms publication-title: Signal Process. doi: 10.1016/j.sigpro.2007.01.016 – ident: 10.1016/j.csl.2016.11.005_bib0058 – start-page: 1749 year: 2014 ident: 10.1016/j.csl.2016.11.005_bib0047 article-title: Medium duration modulation cepstral feature for robust speech recognition – volume: 12 start-page: 75 issue: 2 year: 1998 ident: 10.1016/j.csl.2016.11.005_bib0021 article-title: Maximum likelihood linear transformations for HMM-based speech recognition publication-title: Comput. Speech Lang. doi: 10.1006/csla.1998.0043 – volume: 19 start-page: 69 issue: 1 year: 2010 ident: 10.1016/j.csl.2016.11.005_bib0074 article-title: Blind separation and dereverberation of speech mixtures by joint optimization publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2010.2045183 – volume: 35 start-page: 1365 issue: 10 year: 1987 ident: 10.1016/j.csl.2016.11.005_bib0014 article-title: Robust adaptive beamforming publication-title: IEEE Trans. Acoust. Speech Signal Process. doi: 10.1109/TASSP.1987.1165054 – start-page: 157 year: 2001 ident: 10.1016/j.csl.2016.11.005_bib0015 article-title: Robust localization in reverberant rooms – volume: 11 start-page: 1157 year: 2003 ident: 10.1016/j.csl.2016.11.005_bib0003 article-title: Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures publication-title: EURASIP J. Appl. Signal Process. doi: 10.1155/S1110865703305074 – start-page: 91 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0070 article-title: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR – start-page: 401 year: 2015 ident: 10.1016/j.csl.2016.11.005_bib0053 article-title: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition – start-page: 115 year: 2008 ident: 10.1016/j.csl.2016.11.005_bib0055 article-title: Interpretation of multiparty meetings: the AMI and AMIDA projects – ident: 10.1016/j.csl.2016.11.005_bib0006 – volume: 18 start-page: 1830 issue: 7 year: 2010 ident: 10.1016/j.csl.2016.11.005_bib0017 article-title: Under-determined reverberant audio source separation using a full-rank spatial covariance model publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2010.2050716 – ident: 10.1016/j.csl.2016.11.005_bib0040 – year: 1994 ident: 10.1016/j.csl.2016.11.005_bib0038 article-title: The translingual English database (TED)
SSID	ssj0006547
Score	2.587339
Snippet	•An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data... Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of...
SourceID	hal crossref elsevier
SourceType	Open Access Repository Enrichment Source Index Database Publisher
StartPage	535
SubjectTerms	Computer Science Microphone array Robust ASR Signal and Image Processing Speech enhancement Train/test mismatch
Title	An analysis of environment, microphone and data simulation mismatches in robust speech recognition
URI	https://dx.doi.org/10.1016/j.csl.2016.11.005 https://inria.hal.science/hal-01399180
Volume	46
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEG4QL3rwgRrxQRrjybiwy3YfPRIiwRcXJeG26XbbuEYWwoJHf7szSxcxMRy8dtummc50vm2_mSHkOuQ6DJTLLSVBg5nSzBKhCixfaVcGQoETw9jh54HfH7KHkTeqkG4ZC4O0SnP2L8_04rQ2LS0jzdY0TVsvYB94pRkCokBgUESwswC1vPn1Q_PA4rpLJOlZ2Lt82Sw4XjLH1wfHb2IiT6xg97dv2norb1kLr9M7IHsGLtLOckWHpKKyGtkvSzFQY5k1sruWV_CIxJ2MCpNthE40XYtmu6VjpOAhI11Bn4QiRZTm6diU8YLPOYBYLKNF04zOJvEin9N8qpR8oyu20SQ7JsPe3Wu3b5liCpZkzJ5bgktfaB0qwVxpY8yB5yQCLDrkjghlEAtXeELFPtNaSt8BO24nnpfYMhFKS9s9IdUMlnZKqOAASxh33QTQi8ZU5e0g0K4vFVNcc79O7FKMkTSZxrHgxUdUUsreI5B8hJKHP5AIJF8nN6sh02WajU2dWbk30S9dicANbBp2Bfu4mh7zavc7TxG2IQ7mTmh_Omf_m_uc7LTR3xdBihekOp8t1CWglXncKNSxQbY794_9wTecfeoC
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxRBEK7wOKgHQdSIIHaIXozDzqPn0QcPG4EssHAREm5tT091WCOzG2bReOFP-Qetmu1ZMTEcSLj2THc61dVVX3d_VQXwrlCuyDFRAVrSYIlOBqbAPMjQJTY3SE6MY4ePT7LBmTw8T88X4HcXC8O0Sm_7Zza9tda-peel2ZuMRr0vtD_4SrMgRMHAIPLMyiP89ZPObc2ng11a5PdxvL93-nkQ-NICgZUynAZG2cw4V6CRiQ2ZgZ9GlSH9LlRkCpuXJjGpwTKTzlmbRaTVcZWmVWgrg86GCY27CMuSzAWXTdi5-csr4Wq-M-iaBjy97im1JZXZhp87omyHM4dyybz_O8PFi-5at3Vz-6vw1ONT0Z-J4BksYL0GK13tB-FNwRo8uZXI8DmU_VoYn95EjJ24FT73UVwy548p8Ej_VII5qaIZXfq6YfS5IdTMdbvEqBZX4_K6mYpmgmgvxJzeNK5fwNmDiPglLNU0tVcgjCIcJFWSVASXHOdGj_PcJZlFicqpbB3CToza-tTmXGHju-44bN80SV6z5OnIo0ny6_Bh3mUyy-tx18-yWxv9j3Jq8jt3ddumdZwPz4m8B_2h5jYG3ioqwh_R6_uN_RYeDU6Ph3p4cHK0AY9jBhtthOQmLE2vrvENQaVpudWqpoCvD70X_gBSsieJ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+analysis+of+environment%2C+microphone+and+data+simulation+mismatches+in+robust+speech+recognition&rft.jtitle=Computer+speech+%26+language&rft.au=Vincent%2C+Emmanuel&rft.au=Watanabe%2C+Shinji&rft.au=Nugraha%2C+Aditya+Arie&rft.au=Barker%2C+Jon&rft.date=2017-11-01&rft.pub=Elsevier+Ltd&rft.issn=0885-2308&rft.eissn=1095-8363&rft.volume=46&rft.spage=535&rft.epage=557&rft_id=info:doi/10.1016%2Fj.csl.2016.11.005&rft.externalDocID=S0885230816301231
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-2308&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-2308&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-2308&client=summon