The third ‘CHiME’ speech separation and recognition challenge: Analysis and outcomes
•The presentation of a unique multi-microphone speech recognition challenge with speech recorded in real environments.•A detailed characterisation of the challenge audio using novel analyses to estimate key properties of the speakers, environments and noisy speech signals.•An overview of 26 systems...
Saved in:
Published in | Computer speech & language Vol. 46; pp. 605 - 626 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.11.2017
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | •The presentation of a unique multi-microphone speech recognition challenge with speech recorded in real environments.•A detailed characterisation of the challenge audio using novel analyses to estimate key properties of the speakers, environments and noisy speech signals.•An overview of 26 systems submitted to the challenge presenting a snapshot of the state-of-the-art in distant microphone ASR.•A presentation of system performance identifying which signal processing and statistical modelling techniques are the most beneficial.•A presentation of correlations between signal characteristics and system performances across utterances addressing the question, “What are the particular circumstances that lead to high word error rates?”
This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various ‘axes of difficulty’ by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations. |
---|---|
AbstractList | •The presentation of a unique multi-microphone speech recognition challenge with speech recorded in real environments.•A detailed characterisation of the challenge audio using novel analyses to estimate key properties of the speakers, environments and noisy speech signals.•An overview of 26 systems submitted to the challenge presenting a snapshot of the state-of-the-art in distant microphone ASR.•A presentation of system performance identifying which signal processing and statistical modelling techniques are the most beneficial.•A presentation of correlations between signal characteristics and system performances across utterances addressing the question, “What are the particular circumstances that lead to high word error rates?”
This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various ‘axes of difficulty’ by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations. This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant multichannel, mobile-device speech recognition scenario. The paper serves two purposes. First, it provides a definitive reference for the challenge, including full descriptions of the task design, data capture and baseline systems along with a description and evaluation of the 26 systems that were submitted. The best systems re-engineered every stage of the baseline resulting in reductions in word error rate from 33.4% to as low as 5.8%. By comparing across systems, techniques that are essential for strong performance are identified. Second, the paper considers the problem of drawing conclusions from evaluations that use speech directly recorded in noisy environments. The degree of challenge presented by the resulting material is hard to control and hard to fully characterise. We attempt to dissect the various 'axes of difficulty' by correlating various estimated signal properties with typical system performance on a per session and per utterance basis. We find strong evidence of a dependence on signal-to-noise ratio and channel quality. Systems are less sensitive to variations in the degree of speaker motion. The paper concludes by discussing the outcomes of CHiME-3 in relation to the design of future mobile speech recognition evaluations. |
Author | Vincent, Emmanuel Barker, Jon Watanabe, Shinji Marxer, Ricard |
Author_xml | – sequence: 1 givenname: Jon surname: Barker fullname: Barker, Jon email: j.p.barker@sheffield.ac.uk organization: Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK – sequence: 2 givenname: Ricard orcidid: 0000-0001-5099-5059 surname: Marxer fullname: Marxer, Ricard organization: Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK – sequence: 3 givenname: Emmanuel surname: Vincent fullname: Vincent, Emmanuel organization: Inria, Villers-lès-Nancy 54600, France – sequence: 4 givenname: Shinji surname: Watanabe fullname: Watanabe, Shinji organization: Mitsubishi Electric Research Laboratories, Cambridge, MA 02139-1955, USA |
BackLink | https://inria.hal.science/hal-01382108$$DView record in HAL |
BookMark | eNp9kL9OwzAQhy1UJNrCA7BlZUg5x_kLU1UVilTEUiQ2y3EujavUqexQqVsfA16vT4LTwsLQyb7T7_P5vgHp6UYjIbcURhRofL8aSVuPAnd19QgguiB9ClnkpyxmPdKHNI38gEF6RQbWrgAgjsKkTz4WFXptpUzhHfZfk5l6nR72357dIMrKs7gRRrSq0Z7QhWdQNkutjrWsRF2jXuKDN9ai3lllj5nms5XNGu01uSxFbfHm9xyS96fpYjLz52_PL5Px3JdhCK0vIhZksmBxnLECIIG4YDIuA5YnCEkh8iTMQpHHWRaWElmaR5Fw2SQpszwrU2BDcnd61_2Hb4xaC7PjjVB8Np7zrgeUpQGFdEtdNjllpWmsNVhyqdrjeq0RquYUeCeTr7iTyTuZXcvJdCT9R_6NOsc8nhh0628VGm6lQi2xUE5ky4tGnaF_AG-3j_s |
CitedBy_id | crossref_primary_10_1109_TASLP_2024_3426924 crossref_primary_10_1121_10_0025272 crossref_primary_10_1016_j_eswa_2024_126349 crossref_primary_10_1002_tee_22868 crossref_primary_10_1109_TASLP_2021_3083405 crossref_primary_10_1109_TASLP_2020_3019181 crossref_primary_10_1007_s00530_023_01155_1 crossref_primary_10_1109_TASLP_2022_3224288 crossref_primary_10_1250_ast_e24_124 crossref_primary_10_1109_TASLP_2021_3092567 crossref_primary_10_1016_j_specom_2018_11_005 crossref_primary_10_1109_LSP_2021_3099715 crossref_primary_10_1109_TASLP_2019_2944348 crossref_primary_10_1007_s10772_021_09847_7 crossref_primary_10_1109_LSP_2018_2880285 crossref_primary_10_1109_TETCI_2022_3228537 crossref_primary_10_1016_j_specom_2023_04_001 crossref_primary_10_1021_acsami_8b22613 crossref_primary_10_1109_JSTSP_2017_2764276 crossref_primary_10_1109_TASLP_2021_3067154 crossref_primary_10_1186_s13636_024_00387_x crossref_primary_10_1109_TASLP_2021_3082702 crossref_primary_10_1109_LSP_2024_3449218 crossref_primary_10_1016_j_csl_2022_101409 crossref_primary_10_1016_j_dsp_2019_102632 crossref_primary_10_1109_TASLP_2022_3196168 crossref_primary_10_1109_ACCESS_2021_3139508 crossref_primary_10_5802_roia_51 crossref_primary_10_1109_JPROC_2020_3018668 crossref_primary_10_1109_LSP_2021_3056279 crossref_primary_10_1016_j_specom_2018_05_004 crossref_primary_10_1109_ACCESS_2023_3243690 crossref_primary_10_1109_TASLP_2024_3374065 crossref_primary_10_1109_LSP_2018_2791534 crossref_primary_10_1016_j_csl_2025_101780 |
Cites_doi | 10.1109/TASL.2011.2114881 10.1109/89.326616 10.1109/TASL.2013.2281574 10.1016/j.csl.2012.10.004 10.1016/j.csl.2016.11.005 10.1109/TASL.2007.902460 10.1007/s10579-007-9054-4 10.1080/00031305.1989.10475612 10.1121/1.1915637 |
ContentType | Journal Article |
Copyright | 2016 Elsevier Ltd Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: 2016 Elsevier Ltd – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | AAYXX CITATION 1XC VOOES |
DOI | 10.1016/j.csl.2016.10.005 |
DatabaseName | CrossRef Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Computer Science |
EISSN | 1095-8363 |
EndPage | 626 |
ExternalDocumentID | oai_HAL_hal_01382108v1 10_1016_j_csl_2016_10_005 S088523081630122X |
GroupedDBID | --K --M .DC .~1 0R~ 1B1 1RT 1~. 1~5 29F 4.4 457 4G. 5GY 5VS 6J9 7-5 71M 8P~ 9JN 9JO AACTN AADFP AAEDT AAEDW AAFJI AAGJA AAGUQ AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABFRF ABJNI ABMAC ABMMH ABOYX ABTAH ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACXNI ACZNC ADBBV ADEZE ADFGL ADJOM ADMUD ADTZH AEBSH AECPX AEFWE AEKER AENEX AFKWA AFTJW AFYLN AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV AKYCK ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOMHK AOUOD ASPBG AVARZ AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CAG COF CS3 DM4 DU5 EBS EFBJH EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q G8K GBLVA GBOLZ HLZ HMW HMY HVGLF HZ~ IHE J1W JJJVA KOM LG5 LX9 M3U M3X M41 MO0 MVM N9A O-L O9- OAUVE OKEIE OZT P-8 P-9 P2P PC. PRBVW Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SEW SPC SPCBC SPS SSB SSO SSS SST SSV SSY SSZ T5K TN5 UHS WUQ XFK XPP YK3 ZMT ZY4 ~G- AATTM AAXKI AAYWO AAYXX ABDPE ABWVN ACRPL ACVFH ADCNI ADMHG ADNMO AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH 1XC VOOES |
ID | FETCH-LOGICAL-c440t-a5329cd36693d00706d3c6f23b7e07dab7494ab6994fce38b55a66977f9b9f803 |
IEDL.DBID | .~1 |
ISSN | 0885-2308 |
IngestDate | Fri May 09 12:17:13 EDT 2025 Tue Jul 01 00:18:33 EDT 2025 Thu Apr 24 23:08:24 EDT 2025 Fri Feb 23 02:29:30 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | ‘CHiME’ challenge Microphone array Noise-robust ASR 'CHiME' challenge microphone array |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c440t-a5329cd36693d00706d3c6f23b7e07dab7494ab6994fce38b55a66977f9b9f803 |
ORCID | 0000-0001-5099-5059 0000-0002-0183-7289 |
OpenAccessLink | https://inria.hal.science/hal-01382108 |
PageCount | 22 |
ParticipantIDs | hal_primary_oai_HAL_hal_01382108v1 crossref_citationtrail_10_1016_j_csl_2016_10_005 crossref_primary_10_1016_j_csl_2016_10_005 elsevier_sciencedirect_doi_10_1016_j_csl_2016_10_005 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2017-11-01 |
PublicationDateYYYYMMDD | 2017-11-01 |
PublicationDate_xml | – month: 11 year: 2017 text: 2017-11-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | Computer speech & language |
PublicationYear | 2017 |
Publisher | Elsevier Ltd Elsevier |
Publisher_xml | – name: Elsevier Ltd – name: Elsevier |
References | Lin, M., Q., C., Yan, S., 2014. Network in network. ArXiv Anguera, Wooters, Hernando (bib0001) 2007; 15 El-Desoky Mousa, A., Marchi, E., Schuller, B., 2015. The ICSTM+TUM+UP approach to the 3rd CHiME challenge: single-channel LSTM speech enhancement with multi-channel correlation shaping dereverberation and LSTM language models. ArXiv Veselý, Ghoshal, Burget, Povey (bib0042) 2013 Hermansky, Morgan (bib0016) 1994; 2 Sivasankaran, Nugraha, Vincent, Morales-Cordovilla, Dalmia, Illina (bib0037) December 13–17, 2015 Frigge, Hoaglin, Iglewicz (bib0012) 1989; 43 Loesch, Yang (bib0023) 2010 Ma, Marxer, Barker, Brown (bib0024) December 13–17, 2015 Bagchi, Mandel, Wang, He, Plummer, Fosler-Lussier (bib0003) December 13–17, 2015 Moritz, Gerlach, Adiloglu, Anemüller, Kollmeier, Goetze (bib0028) December 13–17, 2015 Barker, Marxer, Vincent, Watanabe (bib0005) December 13–17, 2015 Misbullah, A., Chien, J.-T., 2016. Deep feedforward and recurrent neural networks for speech recognition. Technical Report. Submitted for publication. Mikolov, Karafiát, Burget, Cernockỳ, Khudanpur (bib0026) 2010; 2 Hori, Chen, Erdogan, Hershey, Le Roux, Mitra, Watanabe (bib0019) December 13–17, 2015 Pfeifenberger, Schrank, Zöhrer, Hagmüller, Pernkopf (bib0032) December 13–17, 2015 Vincent, E., Watanabe, S., Nugraha, A., Barker, J., Marxer, R., 2016. An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Comput. Speech Lang.. Submitted for publication. Zhao, Xiao, Zhang, Nguyen, Zhong, Ren, Wang, Jones, Chng, Li (bib0049) December 13–17, 2015 Jalalvand, Falavigna, Matassoni, Svaizer, Omologo (bib0020) December 13–17, 2015 Renals, Hain, Bourlard (bib0035) 2008 Mestre, Lagunas (bib0025) 2003 Povey, Ghoshal, Boulianne, Burget, Glembek, Goel, Hannemann, Motlicek, Qian, Schwarz, Silovsky, Stemmer, Vesely (bib0033) 2011 Vincent, Barker, Watanabe, Le Roux, Nesta, Matassoni (bib0044) 2013 Yoshioka, Ito, Delcroix, Ogawa, Kinoshita, Fujimoto, Yu, Fabian, Espi, Higuchi, Araki, Nakatani (bib0048) December 13–17, 2015 . Mostefa, Moreau, Choukri, Potamianos, Chu, Tyagi, Casas, Turmo, Cristoforetti, Tobia, Pnevmatikakis, Mylonakis, Talantzis, Burger, Stiefelhagen, Bernardin, Rochet (bib0029) 2007; 41 Barfuss, H., Huemmer, C., Schwarz, A., Kellermann, W., 2015. Robust coherence-based spectral enhancement for distant speech recognition. ArXiv Fletcher, Manson (bib0011) 1933; 82 Barker, Vincent, Ma, Christensen, Green (bib0006) 2013; 27 Vincent, Barker, Watanabe, Le Roux, Nesta, Matassoni (bib0043) 2013 Taal, Hendriks, Heusdens, Jensen (bib0038) 2011; 19 DiBiase, Silverman, Brandstein (bib0008) 2001 Du, Wang, Tu, Bao, Dai, Lee (bib0009) December 13–17, 2015 Wang, X., Wu, C., Zhang, P., Wang, Z., Liu, Y., Li, X., Fu, Q., Yan, Y., 2015. Noise robust IOA/CAS speech separation and recognition system for the third ‘CHIME’ challenge. ArXiv Vu, Bigot, Chng (bib0046) December 13–17, 2015 "><http://www.isle.illinois.edu/sst/data/g2ps>(accessed 04.16.). Prudnikov, Korenevsky, Aleinik (bib0034) December 13–17, 2015 Parihar, Picone, Pearce, Hirsch (bib0031) 2004 Taghia, Martin (bib0040) 2014; 22 Tran, H. D., Dennis, J., Yiren, L., 2016. A comparative study of multi-channel processing methods for noisy automatic speech recognition on the third CHiME challenge. Submitted for publication. Hasegawa-Johnson, M., Fleck, M., 2007. The internatoinal speech LEXicon. Garofalo, Graff, Paul, Pallett (bib0014) 2007 Heymann, Drude, Chinaev, Haeb-Umbach (bib0017) December 13–17, 2015 Castro Martinez, Meyer (bib0007) 2015 Baby, Virtanen, Van Hamme (bib0002) 2015 Kim, Stern (bib0021) 2012 Fujita, Takashima, Homma, Ikeshita, Kawaguchi, Sumiyoshi, Endo, Togami (bib0013) December 13–17, 2015 RWCP, 2001. RWCP Meeting Speech Corpus (RWCP-SP01). Hirsch, Pearce (bib0018) 2000; 4 Zhuang, You, Tan, Bi, Bu, Deng, Qian, Yin, Yu (bib0050) 2015 Pang, Z., Zhu, F., 2015. Noise-robust ASR for the third ‘CHiME’ challenge exploiting time-frequency masking based multi-channel speech enhancement and recurrent neural network. arXiv Tachioka, Kanagawa, Ishii (bib0039) 2015 Vu (10.1016/j.csl.2016.10.005_bib0046) 2015 Bagchi (10.1016/j.csl.2016.10.005_bib0003) 2015 Heymann (10.1016/j.csl.2016.10.005_bib0017) 2015 DiBiase (10.1016/j.csl.2016.10.005_bib0008) 2001 Tachioka (10.1016/j.csl.2016.10.005_bib0039) 2015 Jalalvand (10.1016/j.csl.2016.10.005_bib0020) 2015 Yoshioka (10.1016/j.csl.2016.10.005_bib0048) 2015 10.1016/j.csl.2016.10.005_bib0027 Loesch (10.1016/j.csl.2016.10.005_bib0023) 2010 Garofalo (10.1016/j.csl.2016.10.005_bib0014) 2007 Vincent (10.1016/j.csl.2016.10.005_bib0043) 2013 10.1016/j.csl.2016.10.005_bib0036 Mostefa (10.1016/j.csl.2016.10.005_bib0029) 2007; 41 10.1016/j.csl.2016.10.005_bib0030 Zhao (10.1016/j.csl.2016.10.005_bib0049) 2015 Barker (10.1016/j.csl.2016.10.005_bib0005) 2015 Mestre (10.1016/j.csl.2016.10.005_bib0025) 2003 Veselý (10.1016/j.csl.2016.10.005_bib0042) 2013 Fletcher (10.1016/j.csl.2016.10.005_bib0011) 1933; 82 Prudnikov (10.1016/j.csl.2016.10.005_bib0034) 2015 Mikolov (10.1016/j.csl.2016.10.005_bib0026) 2010; 2 Taghia (10.1016/j.csl.2016.10.005_bib0040) 2014; 22 Du (10.1016/j.csl.2016.10.005_bib0009) 2015 Hori (10.1016/j.csl.2016.10.005_bib0019) 2015 Hermansky (10.1016/j.csl.2016.10.005_bib0016) 1994; 2 Taal (10.1016/j.csl.2016.10.005_bib0038) 2011; 19 Frigge (10.1016/j.csl.2016.10.005_bib0012) 1989; 43 10.1016/j.csl.2016.10.005_bib0045 10.1016/j.csl.2016.10.005_bib0004 10.1016/j.csl.2016.10.005_bib0047 10.1016/j.csl.2016.10.005_bib0041 Kim (10.1016/j.csl.2016.10.005_bib0021) 2012 Parihar (10.1016/j.csl.2016.10.005_bib0031) 2004 Hirsch (10.1016/j.csl.2016.10.005_bib0018) 2000; 4 Pfeifenberger (10.1016/j.csl.2016.10.005_bib0032) 2015 Castro Martinez (10.1016/j.csl.2016.10.005_sbref0006) 2015 Anguera (10.1016/j.csl.2016.10.005_bib0001) 2007; 15 Baby (10.1016/j.csl.2016.10.005_bib0002) 2015 Sivasankaran (10.1016/j.csl.2016.10.005_bib0037) 2015 Zhuang (10.1016/j.csl.2016.10.005_sbref0040) 2015 Barker (10.1016/j.csl.2016.10.005_bib0006) 2013; 27 Ma (10.1016/j.csl.2016.10.005_bib0024) 2015 10.1016/j.csl.2016.10.005_bib0015 10.1016/j.csl.2016.10.005_bib0010 Povey (10.1016/j.csl.2016.10.005_sbref0027) 2011 Renals (10.1016/j.csl.2016.10.005_bib0035) 2008 Moritz (10.1016/j.csl.2016.10.005_bib0028) 2015 10.1016/j.csl.2016.10.005_bib0022 Vincent (10.1016/j.csl.2016.10.005_bib0044) 2013 Fujita (10.1016/j.csl.2016.10.005_bib0013) 2015 |
References_xml | – volume: 41 start-page: 389 year: 2007 end-page: 407 ident: bib0029 article-title: The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms publication-title: Lang. Resour. Eval. – start-page: 452 year: December 13–17, 2015 end-page: 459 ident: bib0032 article-title: Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – year: 2011 ident: bib0033 article-title: The Kaldi speech recognition toolkit publication-title: Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding – reference: >"><http://www.isle.illinois.edu/sst/data/g2ps>(accessed 04.16.). – volume: 2 start-page: 578 year: 1994 end-page: 589 ident: bib0016 article-title: RASTA processing of speech publication-title: IEEE Trans. Speech Audio Process. – start-page: 430 year: December 13–17, 2015 end-page: 435 ident: bib0009 article-title: An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – start-page: 41 year: 2010 end-page: 48 ident: bib0023 article-title: Adaptive segmentation and separation of determined convolutive mixtures under dynamic conditions publication-title: Proceedings of the Ninth International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA) – start-page: 468 year: December 13–17, 2015 end-page: 474 ident: bib0028 article-title: A CHiME-3 challenge system: Long-term acoustic features for noise robust automatic speech recognition publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – reference: El-Desoky Mousa, A., Marchi, E., Schuller, B., 2015. The ICSTM+TUM+UP approach to the 3rd CHiME challenge: single-channel LSTM speech enhancement with multi-channel correlation shaping dereverberation and LSTM language models. ArXiv: – volume: 2 start-page: 3 year: 2010 ident: bib0026 article-title: Recurrent neural network based language model publication-title: Proceedings of the 2010 International Speech Communication Association (INTERSPEECH) – volume: 4 start-page: 29 year: 2000 end-page: 32 ident: bib0018 article-title: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions publication-title: Proceedings of the Sixth International Conference on Spoken Language Processing (ICSLP) – start-page: 490 year: December 13–17, 2015 end-page: 495 ident: bib0024 article-title: Exploiting synchrony spectra and deep neural networks for noise-robust automatic speech recognition publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – start-page: 475 year: December 13–17, 2015 end-page: 481 ident: bib0019 article-title: The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – start-page: 157 year: 2001 end-page: 180 ident: bib0008 article-title: Robust localization in reverberent rooms. publication-title: Microphone Arrays: Techniques and Applications – start-page: 416 year: December 13–17, 2015 end-page: 422 ident: bib0013 article-title: Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – reference: Pang, Z., Zhu, F., 2015. Noise-robust ASR for the third ‘CHiME’ challenge exploiting time-frequency masking based multi-channel speech enhancement and recurrent neural network. arXiv: – volume: 22 start-page: 6 year: 2014 end-page: 16 ident: bib0040 article-title: Objective intelligibility measures based on mutual information for speech subjected to speech enhancement processing publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – reference: Wang, X., Wu, C., Zhang, P., Wang, Z., Liu, Y., Li, X., Fu, Q., Yan, Y., 2015. Noise robust IOA/CAS speech separation and recognition system for the third ‘CHIME’ challenge. ArXiv: – start-page: 4101 year: 2012 end-page: 4104 ident: bib0021 article-title: Power-normalized cepstral coefficients (PNCC) for robust speech recognition publication-title: Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – start-page: 459 year: 2003 end-page: 462 ident: bib0025 article-title: On diagonal loading for minimum variance beamformers publication-title: Proceedings of the Third IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) – start-page: 496 year: December 13–17, 2015 end-page: 503 ident: bib0003 article-title: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – start-page: 444 year: December 13–17, 2015 end-page: 451 ident: bib0017 article-title: BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – volume: 43 start-page: 50 year: 1989 end-page: 54 ident: bib0012 article-title: Some implementations of the boxplot publication-title: Am. Stat. – start-page: 482 year: December 13–17, 2015 end-page: 489 ident: bib0037 article-title: Robust ASR using neural network based speech enhancement and feature simulation publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – start-page: 436 year: December 13–17, 2015 end-page: 443 ident: bib0048 article-title: The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – volume: 19 start-page: 2125 year: 2011 end-page: 2136 ident: bib0038 article-title: An algorithm for intelligibility prediction of time–frequency weighted noisy speech publication-title: IEEE Trans. Audio Speech Lang. Process. – start-page: 126 year: 2013 end-page: 130 ident: bib0044 article-title: The second CHiME speech separation and recognition challenge: datasets, tasks and baselines publication-title: Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – volume: 27 start-page: 621 year: 2013 end-page: 633 ident: bib0006 article-title: The PASCAL CHiME speech separation and recognition challenge publication-title: Comput. Speech Lang. – start-page: 162 year: 2013 end-page: 167 ident: bib0043 article-title: The second ‘CHiME’ speech separation and recognition challenge: an overview of challenge systems and outcomes publication-title: Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – volume: 15 start-page: 2011 year: 2007 end-page: 2022 ident: bib0001 article-title: Acoustic beamforming for speaker diarization of meetings publication-title: IEEE Trans. Audio Speech Lang. Process. – start-page: 115 year: 2008 end-page: 118 ident: bib0035 article-title: Interpretation of multiparty meetings: The AMI and AMIDA projects publication-title: Proceedings of the Second Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA) – volume: 82 start-page: 82 year: 1933 end-page: 108 ident: bib0011 article-title: Loudness, its definition, measurement and calculation publication-title: J. Acoust. Soc. Am. – start-page: 504 year: December 13–17, 2015 end-page: 511 ident: bib0005 article-title: The third ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – reference: RWCP, 2001. RWCP Meeting Speech Corpus (RWCP-SP01). – reference: Vincent, E., Watanabe, S., Nugraha, A., Barker, J., Marxer, R., 2016. An analysis of environment, microphone and data simulation mismatches in robust speech recognition. Comput. Speech Lang.. Submitted for publication. – start-page: 460 year: December 13–17, 2015 end-page: 467 ident: bib0049 article-title: Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – year: 2015 ident: bib0050 article-title: System Combination for Multi-Channel Noise Robust ASR publication-title: Technical Report – reference: Tran, H. D., Dennis, J., Yiren, L., 2016. A comparative study of multi-channel processing methods for noisy automatic speech recognition on the third CHiME challenge. Submitted for publication. – year: 2015 ident: bib0039 article-title: The Overview of the MELCO ASR System for the Third CHiME Challenge publication-title: Technical Report SVAN154551 – start-page: 553—556 year: 2004 ident: bib0031 article-title: Performance analysis of the Aurora large vocabulary baseline system publication-title: Proceedings of the 2004 European Signal Processing Conference (EUSIPCO) – reference: Hasegawa-Johnson, M., Fleck, M., 2007. The internatoinal speech LEXicon. – year: 2015 ident: bib0007 article-title: Mutual Benefits of Auditory Spectro-temporal Gabor Features and Deep Learning for the 3rd CHiME Challenge publication-title: Technical Report – start-page: 409 year: December 13–17, 2015 end-page: 415 ident: bib0020 article-title: Boosted acoustic model learning and hypotheses rescoring on the CHiME3 task publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, (ASRU) – start-page: 401 year: December 13–17, 2015 end-page: 408 ident: bib0034 article-title: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – year: 2007 ident: bib0014 publication-title: CSR-I (WSJ0) Complete – year: 2015 ident: bib0002 article-title: Coupled Dictionary-based Speech Enhancement for CHiME-3 Challenge publication-title: Technical Report KUL/ESAT/PSI/1503 – reference: Lin, M., Q., C., Yan, S., 2014. Network in network. ArXiv: – reference: Barfuss, H., Huemmer, C., Schwarz, A., Kellermann, W., 2015. Robust coherence-based spectral enhancement for distant speech recognition. ArXiv: – reference: Misbullah, A., Chien, J.-T., 2016. Deep feedforward and recurrent neural networks for speech recognition. Technical Report. Submitted for publication. – start-page: 2345 year: 2013 end-page: 2349 ident: bib0042 article-title: Sequence-discriminative training of deep neural networks publication-title: Proceedings of the Fourteenth Annual Conference of the International Speech Communication Association (INTERSPEECH 2013) – reference: . – start-page: 423 year: December 13–17, 2015 end-page: 429 ident: bib0046 article-title: Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge publication-title: Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) – start-page: 430 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0009 article-title: An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework – volume: 19 start-page: 2125 issue: 7 year: 2011 ident: 10.1016/j.csl.2016.10.005_bib0038 article-title: An algorithm for intelligibility prediction of time–frequency weighted noisy speech publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2011.2114881 – start-page: 482 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0037 article-title: Robust ASR using neural network based speech enhancement and feature simulation – ident: 10.1016/j.csl.2016.10.005_bib0022 – start-page: 444 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0017 article-title: BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge – ident: 10.1016/j.csl.2016.10.005_bib0041 – start-page: 41 year: 2010 ident: 10.1016/j.csl.2016.10.005_bib0023 article-title: Adaptive segmentation and separation of determined convolutive mixtures under dynamic conditions – start-page: 423 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0046 article-title: Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge – start-page: 2345 year: 2013 ident: 10.1016/j.csl.2016.10.005_bib0042 article-title: Sequence-discriminative training of deep neural networks – volume: 2 start-page: 578 issue: 4 year: 1994 ident: 10.1016/j.csl.2016.10.005_bib0016 article-title: RASTA processing of speech publication-title: IEEE Trans. Speech Audio Process. doi: 10.1109/89.326616 – volume: 22 start-page: 6 issue: 1 year: 2014 ident: 10.1016/j.csl.2016.10.005_bib0040 article-title: Objective intelligibility measures based on mutual information for speech subjected to speech enhancement processing publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2013.2281574 – start-page: 401 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0034 article-title: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition – start-page: 409 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0020 article-title: Boosted acoustic model learning and hypotheses rescoring on the CHiME3 task – start-page: 460 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0049 article-title: Robust speech recognition using beamforming with adaptive microphone gains and multichannel noise reduction – year: 2011 ident: 10.1016/j.csl.2016.10.005_sbref0027 article-title: The Kaldi speech recognition toolkit – start-page: 416 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0013 article-title: Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection – start-page: 4101 year: 2012 ident: 10.1016/j.csl.2016.10.005_bib0021 article-title: Power-normalized cepstral coefficients (PNCC) for robust speech recognition – start-page: 504 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0005 article-title: The third ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines – start-page: 115 year: 2008 ident: 10.1016/j.csl.2016.10.005_bib0035 article-title: Interpretation of multiparty meetings: The AMI and AMIDA projects – start-page: 553—556 year: 2004 ident: 10.1016/j.csl.2016.10.005_bib0031 article-title: Performance analysis of the Aurora large vocabulary baseline system – ident: 10.1016/j.csl.2016.10.005_bib0004 – start-page: 468 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0028 article-title: A CHiME-3 challenge system: Long-term acoustic features for noise robust automatic speech recognition – ident: 10.1016/j.csl.2016.10.005_bib0036 – year: 2015 ident: 10.1016/j.csl.2016.10.005_sbref0006 article-title: Mutual Benefits of Auditory Spectro-temporal Gabor Features and Deep Learning for the 3rd CHiME Challenge – start-page: 436 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0048 article-title: The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices – start-page: 496 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0003 article-title: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition – start-page: 162 year: 2013 ident: 10.1016/j.csl.2016.10.005_bib0043 article-title: The second ‘CHiME’ speech separation and recognition challenge: an overview of challenge systems and outcomes – ident: 10.1016/j.csl.2016.10.005_bib0015 – ident: 10.1016/j.csl.2016.10.005_bib0047 – year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0002 article-title: Coupled Dictionary-based Speech Enhancement for CHiME-3 Challenge – start-page: 157 year: 2001 ident: 10.1016/j.csl.2016.10.005_bib0008 article-title: Robust localization in reverberent rooms. – start-page: 126 year: 2013 ident: 10.1016/j.csl.2016.10.005_bib0044 article-title: The second CHiME speech separation and recognition challenge: datasets, tasks and baselines – volume: 27 start-page: 621 issue: 3 year: 2013 ident: 10.1016/j.csl.2016.10.005_bib0006 article-title: The PASCAL CHiME speech separation and recognition challenge publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2012.10.004 – year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0039 article-title: The Overview of the MELCO ASR System for the Third CHiME Challenge – ident: 10.1016/j.csl.2016.10.005_bib0045 doi: 10.1016/j.csl.2016.11.005 – volume: 4 start-page: 29 year: 2000 ident: 10.1016/j.csl.2016.10.005_bib0018 article-title: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions – start-page: 459 year: 2003 ident: 10.1016/j.csl.2016.10.005_bib0025 article-title: On diagonal loading for minimum variance beamformers – ident: 10.1016/j.csl.2016.10.005_bib0010 – year: 2015 ident: 10.1016/j.csl.2016.10.005_sbref0040 article-title: System Combination for Multi-Channel Noise Robust ASR – volume: 15 start-page: 2011 issue: 7 year: 2007 ident: 10.1016/j.csl.2016.10.005_bib0001 article-title: Acoustic beamforming for speaker diarization of meetings publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2007.902460 – year: 2007 ident: 10.1016/j.csl.2016.10.005_bib0014 – start-page: 475 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0019 article-title: The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition – volume: 2 start-page: 3 year: 2010 ident: 10.1016/j.csl.2016.10.005_bib0026 article-title: Recurrent neural network based language model – start-page: 452 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0032 article-title: Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results – volume: 41 start-page: 389 issue: 3–4 year: 2007 ident: 10.1016/j.csl.2016.10.005_bib0029 article-title: The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms publication-title: Lang. Resour. Eval. doi: 10.1007/s10579-007-9054-4 – volume: 43 start-page: 50 issue: 1 year: 1989 ident: 10.1016/j.csl.2016.10.005_bib0012 article-title: Some implementations of the boxplot publication-title: Am. Stat. doi: 10.1080/00031305.1989.10475612 – ident: 10.1016/j.csl.2016.10.005_bib0027 – volume: 82 start-page: 82 issue: 5 year: 1933 ident: 10.1016/j.csl.2016.10.005_bib0011 article-title: Loudness, its definition, measurement and calculation publication-title: J. Acoust. Soc. Am. doi: 10.1121/1.1915637 – start-page: 490 year: 2015 ident: 10.1016/j.csl.2016.10.005_bib0024 article-title: Exploiting synchrony spectra and deep neural networks for noise-robust automatic speech recognition – ident: 10.1016/j.csl.2016.10.005_bib0030 |
SSID | ssj0006547 |
Score | 2.4717984 |
Snippet | •The presentation of a unique multi-microphone speech recognition challenge with speech recorded in real environments.•A detailed characterisation of the... This paper presents the design and outcomes of the CHiME-3 challenge, the first open speech recognition evaluation designed to target the increasingly relevant... |
SourceID | hal crossref elsevier |
SourceType | Open Access Repository Enrichment Source Index Database Publisher |
StartPage | 605 |
SubjectTerms | Computer Science Microphone array Noise-robust ASR Signal and Image Processing ‘CHiME’ challenge |
Title | The third ‘CHiME’ speech separation and recognition challenge: Analysis and outcomes |
URI | https://dx.doi.org/10.1016/j.csl.2016.10.005 https://inria.hal.science/hal-01382108 |
Volume | 46 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwELagLDDwRrxlISak0CR27IStqkDhuQBSt8ivqEUorWhgRPwM-Hv9JfiSuMAAA2NOZyv6bJ8vubvvEDrUvgyEIdxTsaYeJVx60oo8nkgimJEskFDgfH3D0nt60Yt6M6jramEgrbKx_bVNr6x1I2k3aLZHg0H71p4P-KUZW48C4kM9qGCnHHb58etXmgc01609ycgDbRfZrHK81BiiDwE7rhK8ot_uptm--8ta3Tpny2ixcRdxp36jFTRjilW05Fox4OZkrqKFb7yCa6hnFx-X_cGTxpO39246uD6dvH3g8cgY1cdjU_N9DwssCo2nKUT2WbnWKifYsZVUOsPn0gJkxuvo_uz0rpt6TQsFT1Hql56ISJgoTRhLiAZqH6aJYnlIJDc-10JymlAhWZLQXBkSyygSVpfzPJFJHvtkA7WKYWE2Ec6ZCCRlFnKT0zwMgarOGKKFb7gwyt9CvgMvUw2_OLS5eMxcItlDZvHOAG8QWby30NF0yKgm1_hLmboVyX7skMwa_7-GHVjoptMDm3baucpABkFa-8UbvwTb_5t7B82HcMtXpYm7qFU-PZs966OUcr_ahPtornN-md58AgeJ5oI |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LT-MwEB5BOQCHfQAr2KeF9oQUmsSOneytqkBhaXsBpN4iv6J2hdKKBs78DPh7_JL1JHG1e1gOe8zItqzP9njimfkG4LsJVSQtFYFODQsYFSpQThSITFHJreKRwgTn8YTnN-znNJluwNDnwmBYZaf7W53eaOtO0u_Q7C_n8_6VOx_4pJk6iwL9Q9NN2EJ2qqQHW4OLy3yyVshYX7c1JpMAO3jnZhPmpVfogIj4aRPjlfzretqc-YfW5uI5fwdvOouRDNpJvYcNW-3BW1-NgXSHcw92_6AW3IepW39Sz-Z3hrw8Pg3z-fjs5fGZrJbW6hlZ2Zbye1ERWRmyjiJy39pXV_lBPGFJ02ZxXzuM7OoAbs7Prod50FVRCDRjYR3IhMaZNpTzjBpk9-GGal7GVAkbCiOVYBmTimcZK7WlqUoS6doKUWYqK9OQfoBetajsIZCSy0gx7lC3JSvjGNnqrKVGhlZIq8MjCD14he4oxrHSxW3hY8l-FQ7vAvFGkcP7CE7WXZYtv8ZrjZlfkeKvTVI4_f9at2MH3Xp4JNTOB6MCZeindT-96UP08f_G_gbb-fV4VIwuJpefYCfGS7_JVPwMvfru3n5xJkutvnZb8jf1XOkz |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=The+third+%27CHIME%27+speech+separation+and+recognition+challenge%3A+Analysis+and+outcomes&rft.jtitle=Computer+speech+%26+language&rft.au=Barker%2C+Jon&rft.au=Marxer%2C+Ricard&rft.au=Vincent%2C+Emmanuel&rft.au=Watanabe%2C+Shinji&rft.date=2017-11-01&rft.pub=Elsevier&rft.issn=0885-2308&rft.eissn=1095-8363&rft.volume=46&rft.spage=605&rft.epage=626&rft_id=info:doi/10.1016%2Fj.csl.2016.10.005&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai_HAL_hal_01382108v1 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-2308&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-2308&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-2308&client=summon |