An analysis of environment, microphone and data simulation mismatches in robust speech recognition

•An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data simulation mismatches.•Based on: a critical analysis of the results published on the CHiME-3 dataset and new experiments.•Result: with the ex...

Full description

Saved in:
Bibliographic Details
Published inComputer speech & language Vol. 46; pp. 535 - 557
Main Authors Vincent, Emmanuel, Watanabe, Shinji, Nugraha, Aditya Arie, Barker, Jon, Marxer, Ricard
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.11.2017
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
Abstract •An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data simulation mismatches.•Based on: a critical analysis of the results published on the CHiME-3 dataset and new experiments.•Result: with the exception of MVDR beamforming, these mismatches have little effect on the ASR performance.•Contribution: the CHiME-4 challenge, which revisits the CHiME-3 dataset and reduces the number of microphones available for testing. Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge, which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing.
AbstractList Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge , which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing.
•An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data simulation mismatches.•Based on: a critical analysis of the results published on the CHiME-3 dataset and new experiments.•Result: with the exception of MVDR beamforming, these mismatches have little effect on the ASR performance.•Contribution: the CHiME-4 challenge, which revisits the CHiME-3 dataset and reduces the number of microphones available for testing. Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of the training data match (or cover) those of the test data. Few studies have systematically assessed the impact of acoustic mismatches between training and test data, especially concerning recent speech enhancement and state-of-the-art ASR techniques. In this article, we study this issue in the context of the CHiME-3 dataset, which consists of sentences spoken by talkers situated in challenging noisy environments recorded using a 6-channel tablet based microphone array. We provide a critical analysis of the results published on this dataset for various signal enhancement, feature extraction, and ASR backend techniques and perform a number of new experiments in order to separately assess the impact of different noise environments, different numbers and positions of microphones, or simulated vs. real data on speech enhancement and ASR performance. We show that, with the exception of minimum variance distortionless response (MVDR) beamforming, most algorithms perform consistently on real and simulated data and can benefit from training on simulated data. We also find that training on different noise environments and different microphones barely affects the ASR performance, especially when several environments are present in the training data: only the number of microphones has a significant impact. Based on these results, we introduce the CHiME-4 Speech Separation and Recognition Challenge, which revisits the CHiME-3 dataset and makes it more challenging by reducing the number of microphones available for testing.
Author Vincent, Emmanuel
Barker, Jon
Watanabe, Shinji
Nugraha, Aditya Arie
Marxer, Ricard
Author_xml – sequence: 1
  givenname: Emmanuel
  surname: Vincent
  fullname: Vincent, Emmanuel
  email: emmanuel.vincent@inria.fr
  organization: Inria, 54600 Villers-lès-Nancy, France
– sequence: 2
  givenname: Shinji
  surname: Watanabe
  fullname: Watanabe, Shinji
  organization: Mitsubishi Electric Research Laboratories, Cambridge, MA 02139, USA
– sequence: 3
  givenname: Aditya Arie
  surname: Nugraha
  fullname: Nugraha, Aditya Arie
  organization: Inria, 54600 Villers-lès-Nancy, France
– sequence: 4
  givenname: Jon
  surname: Barker
  fullname: Barker, Jon
  organization: Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK
– sequence: 5
  givenname: Ricard
  orcidid: 0000-0001-5099-5059
  surname: Marxer
  fullname: Marxer, Ricard
  organization: Department of Computer Science, University of Sheffield, Sheffield S1 4DP, UK
BackLink https://inria.hal.science/hal-01399180$$DView record in HAL
BookMark eNp9kE1LAzEQhoMoWKs_wFuugrvOdDfbLJ6K-AUFL3oO02zWpuwmJYkF_72p1YsHTwmZ98nwPmfs2HlnGLtEKBGwudmUOg7lLF9LxBJAHLEJQisKWTXVMZuAlKKYVSBP2VmMGwBoRD2fsNXCcXI0fEYbue-5cTsbvBuNS9d8tDr47TpvypmOd5SIRzt-DJSsd3kcR0p6bSK3jge_-oiJx60xes2D0f7d2X3unJ30NERz8XNO2dvD_evdU7F8eXy-WywLXdeQCmp1Q30vDdWVBgQpBHZUoZQtktTzFVUkyKyauu-1blDOYdYJ0YHuyPQaqim7Ovy7pkFtgx0pfCpPVj0tlmr_Bli1LUrYYc7OD9lcMMZgeqVt-m6VAtlBIai9VrVRWavaa1WIKmvNJP4hf1f9x9weGJPr76wJKmprnDadzZ6S6rz9h_4CRGOUbQ
CitedBy_id crossref_primary_10_1016_j_iswa_2023_200288
crossref_primary_10_1109_TASLP_2024_3426924
crossref_primary_10_1007_s12652_021_03216_7
crossref_primary_10_1109_TASLP_2019_2959721
crossref_primary_10_1145_3310132
crossref_primary_10_7210_jrsj_42_920
crossref_primary_10_3390_s23010111
crossref_primary_10_1016_j_apacoust_2024_110407
crossref_primary_10_1103_PhysRevE_106_035303
crossref_primary_10_1109_TASLP_2024_3352249
crossref_primary_10_1016_j_dsp_2018_11_005
crossref_primary_10_1109_TASLP_2020_3036776
crossref_primary_10_1109_TASLP_2019_2940662
crossref_primary_10_1109_LSP_2021_3099715
crossref_primary_10_1109_TASLP_2018_2881912
crossref_primary_10_3390_s24206644
crossref_primary_10_1371_journal_pone_0212342
crossref_primary_10_1109_MSP_2024_3451653
crossref_primary_10_1109_TASLP_2020_2980372
crossref_primary_10_1109_TASLP_2018_2876169
crossref_primary_10_2139_ssrn_4162355
crossref_primary_10_1109_TASLP_2018_2870742
crossref_primary_10_1109_LSP_2021_3056279
crossref_primary_10_1109_JSTSP_2017_2763455
crossref_primary_10_1155_2022_9722209
crossref_primary_10_1177_23312165241292205
crossref_primary_10_1109_TASLP_2022_3145319
crossref_primary_10_1016_j_neucom_2018_01_013
crossref_primary_10_1109_JSTSP_2019_2909193
crossref_primary_10_3390_app13084926
crossref_primary_10_3390_app10030769
crossref_primary_10_3390_biomimetics5010001
crossref_primary_10_3233_JIFS_189469
crossref_primary_10_1016_j_csl_2024_101751
crossref_primary_10_1109_JSTSP_2017_2764276
crossref_primary_10_1109_JSTSP_2017_2752691
crossref_primary_10_1007_s11082_023_05926_y
crossref_primary_10_1049_iet_spr_2019_0304
crossref_primary_10_1109_TASLP_2024_3407533
crossref_primary_10_1016_j_ijleo_2022_168762
crossref_primary_10_1109_TASLP_2020_3036783
crossref_primary_10_1155_2022_3900336
crossref_primary_10_1016_j_neunet_2022_01_003
crossref_primary_10_1109_TASLP_2020_2998279
crossref_primary_10_1155_2021_6783205
crossref_primary_10_1109_TASLP_2021_3092585
crossref_primary_10_1109_ACCESS_2018_2871713
crossref_primary_10_3233_JIFS_219147
crossref_primary_10_1016_j_specom_2021_01_002
crossref_primary_10_1109_TASLP_2023_3328282
crossref_primary_10_1155_2022_2910859
crossref_primary_10_1016_j_neucom_2023_127015
crossref_primary_10_1109_ACCESS_2018_2882055
crossref_primary_10_1109_LSP_2020_3039944
crossref_primary_10_3233_JIFS_189521
crossref_primary_10_1016_j_csl_2020_101155
crossref_primary_10_3390_electronics9071157
crossref_primary_10_3389_frobt_2018_00010
crossref_primary_10_1109_LSP_2018_2880285
crossref_primary_10_1109_LSP_2020_3025410
crossref_primary_10_1109_TASLP_2022_3172632
crossref_primary_10_1155_2022_3192892
crossref_primary_10_1109_LSP_2023_3289110
crossref_primary_10_1007_s11277_021_08773_w
crossref_primary_10_1016_j_ecolind_2020_106559
crossref_primary_10_1109_OJSP_2020_3045349
crossref_primary_10_1109_JPROC_2020_3018668
crossref_primary_10_1109_LSP_2024_3505794
crossref_primary_10_1016_j_procs_2020_12_020
crossref_primary_10_1016_j_csl_2016_10_005
crossref_primary_10_1002_tee_22868
crossref_primary_10_1016_j_specom_2023_102958
crossref_primary_10_1109_JSTSP_2019_2923372
crossref_primary_10_1109_ACCESS_2024_3427778
crossref_primary_10_1109_TASLP_2021_3067202
crossref_primary_10_1155_2022_1948159
crossref_primary_10_1109_TASLP_2024_3350887
crossref_primary_10_1109_LSP_2019_2932848
crossref_primary_10_1109_MSP_2019_2918706
crossref_primary_10_1145_3567734
crossref_primary_10_1109_TASLP_2020_2996503
crossref_primary_10_1109_TASLP_2019_2907015
crossref_primary_10_1109_TASLP_2022_3190739
crossref_primary_10_1016_j_measurement_2024_115722
crossref_primary_10_1016_j_dsp_2017_12_011
crossref_primary_10_3390_app9214639
crossref_primary_10_1109_TASLP_2020_2979603
crossref_primary_10_1016_j_engappai_2023_107807
crossref_primary_10_1016_j_dcan_2022_04_035
crossref_primary_10_1155_2022_9033421
crossref_primary_10_1186_s13636_024_00387_x
crossref_primary_10_1186_s13636_024_00382_2
crossref_primary_10_1007_s12652_020_02598_4
crossref_primary_10_1109_TSP_2021_3068626
crossref_primary_10_3390_electronics8080897
crossref_primary_10_1186_s13636_021_00231_6
crossref_primary_10_7735_ksmte_2024_33_1_27
crossref_primary_10_1109_ACCESS_2023_3328208
crossref_primary_10_3233_JIFS_189796
Cites_doi 10.1109/TASL.2009.2029711
10.1109/78.934132
10.1016/j.csl.2012.07.008
10.1109/TASLP.2015.2473684
10.1007/BF02999432
10.1109/LSP.2013.2291240
10.1016/j.specom.2015.09.004
10.1109/TASLP.2014.2352935
10.1109/MSP.2009.932166
10.1109/TASL.2007.902460
10.1109/TASLP.2016.2580946
10.1016/j.csl.2010.12.003
10.1016/j.csl.2012.10.004
10.1016/j.sigpro.2007.01.016
10.1006/csla.1998.0043
10.1109/TASL.2010.2045183
10.1109/TASSP.1987.1165054
10.1155/S1110865703305074
10.1109/TASL.2010.2050716
ContentType Journal Article
Copyright 2016 Elsevier Ltd
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: 2016 Elsevier Ltd
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
1XC
VOOES
DOI 10.1016/j.csl.2016.11.005
DatabaseName CrossRef
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1095-8363
EndPage 557
ExternalDocumentID oai_HAL_hal_01399180v1
10_1016_j_csl_2016_11_005
S0885230816301231
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1RT
1~.
1~5
29F
4.4
457
4G.
5GY
5VS
6J9
7-5
71M
8P~
9JN
9JO
AACTN
AADFP
AAEDT
AAEDW
AAFJI
AAGJA
AAGUQ
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABFRF
ABJNI
ABMAC
ABMMH
ABOYX
ABTAH
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACXNI
ACZNC
ADBBV
ADEZE
ADFGL
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEFWE
AEKER
AENEX
AFKWA
AFTJW
AFYLN
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
AKYCK
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOMHK
AOUOD
ASPBG
AVARZ
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CAG
COF
CS3
DM4
DU5
EBS
EFBJH
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
G8K
GBLVA
GBOLZ
HLZ
HMW
HMY
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG5
LX9
M3U
M3X
M41
MO0
MVM
N9A
O-L
O9-
OAUVE
OKEIE
OZT
P-8
P-9
P2P
PC.
PRBVW
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SPS
SSB
SSO
SSS
SST
SSV
SSY
SSZ
T5K
TN5
UHS
WUQ
XFK
XPP
YK3
ZMT
ZY4
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABDPE
ABWVN
ACRPL
ACVFH
ADCNI
ADMHG
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AFXIZ
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
BNPGV
CITATION
SSH
1XC
VOOES
ID FETCH-LOGICAL-c440t-a9c6aff8ea43c0108551da318891a8c7ba3a5aeb64ffcc618702d55d0cdaefc03
IEDL.DBID .~1
ISSN 0885-2308
IngestDate Sat Jun 07 06:28:09 EDT 2025
Tue Jul 01 00:18:33 EDT 2025
Thu Apr 24 23:01:45 EDT 2025
Fri Feb 23 02:29:30 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords Speech enhancement
Microphone array
Train/test mismatch
Robust ASR
speech enhancement
microphone array
train/test mismatch
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c440t-a9c6aff8ea43c0108551da318891a8c7ba3a5aeb64ffcc618702d55d0cdaefc03
ORCID 0000-0001-5099-5059
0000-0002-0183-7289
OpenAccessLink https://inria.hal.science/hal-01399180
PageCount 23
ParticipantIDs hal_primary_oai_HAL_hal_01399180v1
crossref_citationtrail_10_1016_j_csl_2016_11_005
crossref_primary_10_1016_j_csl_2016_11_005
elsevier_sciencedirect_doi_10_1016_j_csl_2016_11_005
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2017-11-01
PublicationDateYYYYMMDD 2017-11-01
PublicationDate_xml – month: 11
  year: 2017
  text: 2017-11-01
  day: 01
PublicationDecade 2010
PublicationTitle Computer speech & language
PublicationYear 2017
Publisher Elsevier Ltd
Elsevier
Publisher_xml – name: Elsevier Ltd
– name: Elsevier
References Zelinski (bib0075) 1988; vol. 5
Mikolov, Karafiát, Burget, Cernocký, Khudanpur (bib0045) 2010
Garofalo, Graff, Paul, Pallett (bib0023) 2007
Fujita, Takashima, Homma, Ikeshita, Kawaguchi, Sumiyoshi, Endo, Togami (bib0020) 2015
Fox, Liu, Zwyssig, Hain (bib0019) 2013
Karafiát, Burget, Matějka, Glembek, Černocký (bib0032) 2011
Kim, Smaragdis (bib0034) 2015
Kneser, Ney (bib0036) 1995; vol.1
Shinoda, K., 2011. Speaker adaptation techniques for automatic speech recognition. Proceedings of the APSIPA ASC 2011.
Stupakov, Hanusa, Vijaywargi, Fox, Bilmes (bib0062) 2011; 26
Barker, Marxer, Vincent, Watanabe (bib0008) 2016
Wölfel, McDonough (bib0071) 2009
Seltzer, Yu, Wang (bib0057) 2013
Vu, Bigot, Chng (bib0067) 2015
Anderson, Teal, Poletti (bib0001) 2015; 23
Chen, Wang, Wang (bib0012) 2015
Renals, Hain, Bourlard (bib0055) 2008
Swietojanski, Renals (bib0063) 2014
Barker, Marxer, Vincent, Watanabe (bib0007) 2015
Bell, Gales, Hain, Kilgour, Lanchantin, Liu, McParland, Renals, Saz, Wester, Woodland (bib0010) 2015
Wang, X., Wu, C., Zhang, P., Wang, Z., Liu, Y., Li, X., Fu, Q., Yan, Y., 2015. Noise robust IOA/CAS speech separation and recognition system for the third ’CHIME’ challenge. ArXiv
Ravanelli, Cristoforetti, Gretter, Pellin, Sosi, Omologo (bib0054) 2015
Moritz, Gerlach, Adiloglu, Anemüller, Kollmeier, Goetze (bib0048) 2015
.
Lin, M., Chen, Q., Yan, S., 2014. Network in network. ArXiv
Mandel, Weiss, Ellis (bib0042) 2010; 18
Sivasankaran, Nugraha, Vincent, Morales-Cordovilla, Dalmia, Illina (bib0060) 2015
(bib0013) 2010
Weninger, Erdogan, Watanabe, Vincent, Le Roux, Hershey, Schuller (bib0070) 2015
Barfuss, H., Huemmer, C., Schwarz, A., Kellermann, W., 2015. Robust coherence-based spectral enhancement for distant speech recognition. ArXiv
Gales (bib0021) 1998; 12
Doclo, Moonen (bib0016) 2007; 15
Nugraha, Liutkus, Vincent (bib0049) 2016; 24
Simmer, Fischer, Wasiljeff (bib0059) 1994; 7/8
Mitra, Franco, Graciarena (bib0046) 2013
Karanasou, Wang, Gales, Woodland (bib0033) 2014
Yoshioka, Nakatani, Miyoshi, Okuno (bib0074) 2010; 19
Mestre, Lagunas (bib0044) 2003
Martinez, Meyer (bib0043) 2015
Yoshioka, Ito, Delcroix, Ogawa, Kinoshita, Fujimoto, Yu, Fabian, Espi, Higuchi, Araki, Nakatani (bib0073) 2015
Hirsch, Pearce (bib0028) 2000
Nugraha, Liutkus, Vincent (bib0050) 2016
Vincent, Gribonval, Plumbley (bib0065) 2007; 87
DiBiase, Silverman, Brandstein (bib0015) 2001
Gillick, Cox (bib0024) 1989
Povey, Ghoshal, Boulianne, Burget, Glembek, Goel, Hannemann, Motlicek, Qian, Schwarz, Silovsky, Stemmer, Vesely (bib0052) 2011
Harper (bib0026) 2015
Gannot, Burshtein, Weinstein (bib0022) 2001; 49
Xu, Du, Dai, Lee (bib0072) 2014; 21
Baker, Deng, Glass, Khudanpur, Lee, Morgan, O’Shaughnessy (bib0005) 2009; 26
Brutti, Matassoni (bib0011) 2016; 76
Kanda, Takeda, Obuchi (bib0031) 2013
Barker, Vincent, Ma, Christensen, Green (bib0009) 2013; 27
Araki, Makino, Hinamoto, Mukai, Nishikawa, Saruwatari (bib0003) 2003; 11
Bagchi, Mandel, Wang, He, Plummer, Fosler-Lussier (bib0004) 2015
Anguera, Wooters, Hernando (bib0002) 2007; 15
Hori, Chen, Erdogan, Hershey, Roux, Mitra, Watanabe (bib0029) 2015
Fiscus (bib0018) 1997
Hurmalainen, Gemmeke, Virtanen (bib0030) 2013; 27
Tachioka, Kanagawa, Ishii (bib0064) 2015
Heymann, Drude, Chinaev, Haeb-Umbach (bib0027) 2015
Cox, Zeskind, Owen (bib0014) 1987; 35
Mitra, Franco, Graciarena, Vergyri (bib0047) 2014
Lamel, Schiel, Fourcin, Mariani, Tillman (bib0038) 1994
Pang, Z., Zhu, F., 2015. Noise-robust ASR for the third ’CHiME’ challenge exploiting time-frequency masking based multi-channel speech enhancement and recurrent neural network. ArXiv
Prudnikov, Korenevsky, Aleinik (bib0053) 2015
Liutkus, Fitzgerald, Rafii (bib0041) 2015
Duong, Vincent, Gribonval (bib0017) 2010; 18
Kumatani, Arakawa, Yamamoto, McDonough, Raj, Singh, Tashev (bib0037) 2012
Schwarz, Kellermann (bib0056) 2014
Stolbov, Aleinik (bib0061) 2015; 9
Wang, Narayanan, Wang (bib0069) 2014; 22
Kinoshita, Delcroix, Yoshioka, Nakatani, Habets, Haeb-Umbach, Leutnant, Sehr, Kellermann, Maas, Gannot, Raj (bib0035) 2013
Li, Deng, Haeb-Umbach, Gong (bib0039) 2015
Hansen, Angkititrakul, Plucienkowski, Gallant, Yapanel (bib0025) 2001
(bib0066) 2012
Doclo (10.1016/j.csl.2016.11.005_bib0016) 2007; 15
Hurmalainen (10.1016/j.csl.2016.11.005_bib0030) 2013; 27
Li (10.1016/j.csl.2016.11.005_bib0039) 2015
Gales (10.1016/j.csl.2016.11.005_bib0021) 1998; 12
Swietojanski (10.1016/j.csl.2016.11.005_bib0063) 2014
DiBiase (10.1016/j.csl.2016.11.005_bib0015) 2001
Bell (10.1016/j.csl.2016.11.005_bib0010) 2015
Simmer (10.1016/j.csl.2016.11.005_bib0059) 1994; 7/8
Kanda (10.1016/j.csl.2016.11.005_bib0031) 2013
Stolbov (10.1016/j.csl.2016.11.005_bib0061) 2015; 9
Kim (10.1016/j.csl.2016.11.005_bib0034) 2015
Wölfel (10.1016/j.csl.2016.11.005_bib0071) 2009
Araki (10.1016/j.csl.2016.11.005_bib0003) 2003; 11
10.1016/j.csl.2016.11.005_bib0040
Schwarz (10.1016/j.csl.2016.11.005_bib0056) 2014
Barker (10.1016/j.csl.2016.11.005_bib0009) 2013; 27
(10.1016/j.csl.2016.11.005_bib0013) 2010
Gillick (10.1016/j.csl.2016.11.005_bib0024) 1989
Mandel (10.1016/j.csl.2016.11.005_bib0042) 2010; 18
Mitra (10.1016/j.csl.2016.11.005_bib0047) 2014
Lamel (10.1016/j.csl.2016.11.005_bib0038) 1994
Barker (10.1016/j.csl.2016.11.005_sbref0007) 2016
Fiscus (10.1016/j.csl.2016.11.005_bib0018) 1997
Kinoshita (10.1016/j.csl.2016.11.005_bib0035) 2013
Heymann (10.1016/j.csl.2016.11.005_bib0027) 2015
Fujita (10.1016/j.csl.2016.11.005_bib0020) 2015
Zelinski (10.1016/j.csl.2016.11.005_bib0075) 1988; vol. 5
Weninger (10.1016/j.csl.2016.11.005_bib0070) 2015
Xu (10.1016/j.csl.2016.11.005_bib0072) 2014; 21
Karafiát (10.1016/j.csl.2016.11.005_bib0032) 2011
Harper (10.1016/j.csl.2016.11.005_bib0026) 2015
Anderson (10.1016/j.csl.2016.11.005_bib0001) 2015; 23
Duong (10.1016/j.csl.2016.11.005_bib0017) 2010; 18
10.1016/j.csl.2016.11.005_bib0051
Hansen (10.1016/j.csl.2016.11.005_bib0025) 2001
Garofalo (10.1016/j.csl.2016.11.005_bib0023) 2007
10.1016/j.csl.2016.11.005_bib0058
Tachioka (10.1016/j.csl.2016.11.005_bib0064) 2015
Yoshioka (10.1016/j.csl.2016.11.005_bib0074) 2010; 19
10.1016/j.csl.2016.11.005_bib0006
Baker (10.1016/j.csl.2016.11.005_bib0005) 2009; 26
(10.1016/j.csl.2016.11.005_bib0066) 2012
Nugraha (10.1016/j.csl.2016.11.005_bib0049) 2016; 24
Hirsch (10.1016/j.csl.2016.11.005_bib0028) 2000
Sivasankaran (10.1016/j.csl.2016.11.005_bib0060) 2015
Anguera (10.1016/j.csl.2016.11.005_bib0002) 2007; 15
Nugraha (10.1016/j.csl.2016.11.005_bib0050) 2016
Seltzer (10.1016/j.csl.2016.11.005_bib0057) 2013
Cox (10.1016/j.csl.2016.11.005_bib0014) 1987; 35
Martinez (10.1016/j.csl.2016.11.005_sbref0041) 2015
Vincent (10.1016/j.csl.2016.11.005_sbref0061) 2007; 87
Barker (10.1016/j.csl.2016.11.005_bib0007) 2015
Renals (10.1016/j.csl.2016.11.005_bib0055) 2008
Kumatani (10.1016/j.csl.2016.11.005_bib0037) 2012
Mestre (10.1016/j.csl.2016.11.005_bib0044) 2003
Fox (10.1016/j.csl.2016.11.005_bib0019) 2013
Mikolov (10.1016/j.csl.2016.11.005_bib0045) 2010
Prudnikov (10.1016/j.csl.2016.11.005_bib0053) 2015
Chen (10.1016/j.csl.2016.11.005_bib0012) 2015
Karanasou (10.1016/j.csl.2016.11.005_bib0033) 2014
Vu (10.1016/j.csl.2016.11.005_bib0067) 2015
10.1016/j.csl.2016.11.005_bib0068
Povey (10.1016/j.csl.2016.11.005_bib0052) 2011
Brutti (10.1016/j.csl.2016.11.005_bib0011) 2016; 76
Wang (10.1016/j.csl.2016.11.005_bib0069) 2014; 22
Liutkus (10.1016/j.csl.2016.11.005_bib0041) 2015
Ravanelli (10.1016/j.csl.2016.11.005_bib0054) 2015
Stupakov (10.1016/j.csl.2016.11.005_bib0062) 2011; 26
Yoshioka (10.1016/j.csl.2016.11.005_bib0073) 2015
Mitra (10.1016/j.csl.2016.11.005_bib0046) 2013
Kneser (10.1016/j.csl.2016.11.005_bib0036) 1995; vol.1
Bagchi (10.1016/j.csl.2016.11.005_bib0004) 2015
Hori (10.1016/j.csl.2016.11.005_bib0029) 2015
Moritz (10.1016/j.csl.2016.11.005_bib0048) 2015
Gannot (10.1016/j.csl.2016.11.005_bib0022) 2001; 49
References_xml – start-page: 416
  year: 2015
  end-page: 422
  ident: bib0020
  article-title: Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: vol. 5
  start-page: 2578
  year: 1988
  end-page: 2581
  ident: bib0075
  article-title: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms
  publication-title: Proceedings of the 1988 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
– start-page: 687
  year: 2015
  end-page: 693
  ident: bib0010
  article-title: The MGB challenge: Evaluating multi-genre broadcast media recognition
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 15
  start-page: 617
  year: 2007
  end-page: 631
  ident: bib0016
  article-title: Superdirective beamforming robust against microphone mismatch
  publication-title: IEEE Trans. Acoust. Speech Signal Process.
– volume: 22
  start-page: 1849
  year: 2014
  end-page: 1858
  ident: bib0069
  article-title: On training targets for supervised speech separation
  publication-title: IEEE/ACM Trans. Audio, Speech Lang. Process.
– volume: 18
  start-page: 382
  year: 2010
  end-page: 394
  ident: bib0042
  article-title: Model-based expectation maximization source separation and localization
  publication-title: IEEE Trans. Audio Speech Lang. Process.
– reference: Lin, M., Chen, Q., Yan, S., 2014. Network in network. ArXiv:
– start-page: 1749
  year: 2014
  end-page: 1753
  ident: bib0047
  article-title: Medium duration modulation cepstral feature for robust speech recognition
  publication-title: Proceedings of the 2014 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
– year: 2012
  ident: bib0066
  publication-title: Techniques for Noise Robustness in Automatic Speech Recognition
– start-page: 482
  year: 2015
  end-page: 489
  ident: bib0060
  article-title: Robust ASR using neural network based speech enhancement and feature simulation
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 26
  start-page: 52
  year: 2011
  end-page: 66
  ident: bib0062
  article-title: The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments
  publication-title: Comput. Speech Lang.
– start-page: 423
  year: 2015
  end-page: 429
  ident: bib0067
  article-title: Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge
  publication-title: Proceedings of the 2015IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 21
  start-page: 65
  year: 2014
  end-page: 68
  ident: bib0072
  article-title: An experimental study on speech enhancement based on deep neural networks
  publication-title: IEEE Signal Process. Lett.
– start-page: 171
  year: 2014
  end-page: 176
  ident: bib0063
  article-title: Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
  publication-title: Proceedings of the 2014 IEEE Spoken Language Technology Workshop (SLT)
– volume: vol.1
  start-page: 181
  year: 1995
  end-page: 184
  ident: bib0036
  article-title: Improved backing-off for m-gram language modeling
  publication-title: Proceedings of the 1995 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
– start-page: 1
  year: 2012
  end-page: 10
  ident: bib0037
  article-title: Microphone array processing for distant speech recognition: Towards real-world deployment
  publication-title: Proceedings of the APSIPA Annual Summit and Conf.
– start-page: 504
  year: 2015
  end-page: 511
  ident: bib0007
  article-title: The third ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 11
  start-page: 1157
  year: 2003
  end-page: 1166
  ident: bib0003
  article-title: Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures
  publication-title: EURASIP J. Appl. Signal Process.
– year: 2007
  ident: bib0023
  publication-title: CSR-I (WSJ0) Complete
– volume: 24
  start-page: 1652
  year: 2016
  end-page: 1664
  ident: bib0049
  article-title: Multichannel audio source separation with deep neural networks
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– volume: 23
  start-page: 2189
  year: 2015
  end-page: 2197
  ident: bib0001
  article-title: Spatially robust far-field beamforming using the von Mises(-Fisher) distribution
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 76
  year: 2015
  end-page: 80
  ident: bib0041
  article-title: Scalable audio separation with light kernel additive modelling
  publication-title: Proceedings of the 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
– volume: 87
  year: 2007
  ident: bib0065
  article-title: Oracle estimators for the benchmarking of source separation algorithms
  publication-title: Signal Process.
– start-page: 7398
  year: 2013
  end-page: 7402
  ident: bib0057
  article-title: An investigation of deep neural networks for noise robust speech recognition
  publication-title: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– start-page: 152
  year: 2011
  end-page: 157
  ident: bib0032
  article-title: ivector-based discriminative adaptation for automatic speech recognition
  publication-title: Proceedings of the 2011 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– year: 2015
  ident: bib0039
  publication-title: Robust Automatic Speech Recognition – A Bridge to Practical Applications
– volume: 9
  start-page: 310
  year: 2015
  end-page: 319
  ident: bib0061
  article-title: Improvement of microphone array characteristics for speech capturing
  publication-title: Mod. Appl. Sci.
– start-page: 532
  year: 1989
  end-page: 535
  ident: bib0024
  article-title: Some statistical issues in the comparison of speech recognition algorithms
  publication-title: Proceedings of the 1989 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
– start-page: 91
  year: 2015
  end-page: 99
  ident: bib0070
  article-title: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR
  publication-title: Proceedings of the 12th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA)
– start-page: 2023
  year: 2001
  end-page: 2026
  ident: bib0025
  article-title: “CU-Move”: analysis & corpus development for interactive in-vehicle speech systems
  publication-title: Proceedings of Eurospeech
– start-page: 6
  year: 2014
  end-page: 10
  ident: bib0056
  article-title: Unbiased coherent-to-diffuse ratio estimation for dereverberation
  publication-title: Proceedings of the 2014 International Workshop on Acoustic Signal Enhancement (IWAENC)
– start-page: 1
  year: 2013
  end-page: 4
  ident: bib0035
  article-title: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech
  publication-title: Proceedings of the 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
– start-page: 886
  year: 2013
  end-page: 890
  ident: bib0046
  article-title: Damped oscillator cepstral coefficients for robust speech recognition
  publication-title: Proceedings of the Interspeech
– year: 2015
  ident: bib0064
  article-title: The Overview of the MELCO ASR System for the Third CHiME Challenge
  publication-title: Technical Report SVAN154551
– volume: 18
  start-page: 1830
  year: 2010
  end-page: 1840
  ident: bib0017
  article-title: Under-determined reverberant audio source separation using a full-rank spatial covariance model
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– volume: 27
  start-page: 763
  year: 2013
  end-page: 779
  ident: bib0030
  article-title: Modelling non-stationary noise with spectral factorisation in automatic speech recognition
  publication-title: Comput. Speech Lang.
– start-page: 401
  year: 2015
  end-page: 408
  ident: bib0053
  article-title: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– reference: Barfuss, H., Huemmer, C., Schwarz, A., Kellermann, W., 2015. Robust coherence-based spectral enhancement for distant speech recognition. ArXiv:
– year: 2009
  ident: bib0071
  publication-title: Distant Speech Recognition
– start-page: 275
  year: 2015
  end-page: 282
  ident: bib0054
  article-title: The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– start-page: 181
  year: 2000
  end-page: 188
  ident: bib0028
  article-title: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
  publication-title: Proceedings of the ASR2000
– volume: 12
  start-page: 75
  year: 1998
  end-page: 98
  ident: bib0021
  article-title: Maximum likelihood linear transformations for HMM-based speech recognition
  publication-title: Comput. Speech Lang.
– start-page: 459
  year: 2003
  end-page: 462
  ident: bib0044
  article-title: On diagonal loading for minimum variance beamformers
  publication-title: Proceedings of the 3rd IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)
– year: 2016
  ident: bib0050
  article-title: Multichannel music separation with deep neural networks
  publication-title: Proceedings of the EUSIPCO
– volume: 35
  start-page: 1365
  year: 1987
  end-page: 1376
  ident: bib0014
  article-title: Robust adaptive beamforming
  publication-title: IEEE Trans. Acoust. Speech Signal Process.
– reference: Pang, Z., Zhu, F., 2015. Noise-robust ASR for the third ’CHiME’ challenge exploiting time-frequency masking based multi-channel speech enhancement and recurrent neural network. ArXiv:
– volume: 76
  start-page: 170
  year: 2016
  end-page: 185
  ident: bib0011
  article-title: On the relationship between early-to-late ratio of room impulse responses and ASR performance in reverberant environments
  publication-title: Speech Commun.
– start-page: 547
  year: 2015
  end-page: 554
  ident: bib0026
  article-title: The automatic speech recognition in reverberant environments (ASpIRE) challenge
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– year: 2016
  ident: bib0008
  article-title: The third ‘CHiME’ speech separation and recognition challenge: analysis and outcomes
  publication-title: Comput. Speech Lang
– start-page: 2180
  year: 2014
  end-page: 2184
  ident: bib0033
  article-title: Adaptation of deep neural network acoustic models using factorised i-vectors
  publication-title: Proceedings of the Interspeech
– volume: 27
  start-page: 621
  year: 2013
  end-page: 633
  ident: bib0009
  article-title: The PASCAL CHiME speech separation and recognition challenge
  publication-title: Comput. Speech Lang.
– start-page: 309
  year: 2013
  end-page: 314
  ident: bib0031
  article-title: Elastic spectral distortion for low resource speech recognition with deep neural networks
  publication-title: Proceedings of the 2013 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– start-page: 115
  year: 2008
  end-page: 118
  ident: bib0055
  article-title: Interpretation of multiparty meetings: the AMI and AMIDA projects
  publication-title: Proceedings of the 2nd Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA)
– start-page: 436
  year: 2015
  end-page: 443
  ident: bib0073
  article-title: The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– year: 2010
  ident: bib0013
  publication-title: Speech Processing in Modern Communication: Challenges and Perspectives
– start-page: 1116
  year: 2013
  end-page: 1120
  ident: bib0019
  article-title: The Sheffield wargames corpus
  publication-title: Proceedings of Interspeech
– start-page: 475
  year: 2015
  end-page: 481
  ident: bib0029
  article-title: The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 15
  start-page: 2011
  year: 2007
  end-page: 2023
  ident: bib0002
  article-title: Acoustic beamforming for speaker diarization of meetings
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 496
  year: 2015
  end-page: 503
  ident: bib0004
  article-title: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 26
  start-page: 75
  year: 2009
  end-page: 80
  ident: bib0005
  article-title: Research developments and directions in speech recognition and understanding, part 1
  publication-title: IEEE Signal Process. Mag.
– year: 2015
  ident: bib0043
  article-title: Mutual Benefits of Auditory Spectro-Temporal Gabor Features and Deep Learning for the 3rd CHiME Challenge
– year: 2011
  ident: bib0052
  article-title: The kaldi speech recognition toolkit
  publication-title: Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding (ASRU)
– reference: .
– volume: 19
  start-page: 69
  year: 2010
  end-page: 84
  ident: bib0074
  article-title: Blind separation and dereverberation of speech mixtures by joint optimization
  publication-title: IEEE Trans. Audio Speech Lang. Process.
– reference: Wang, X., Wu, C., Zhang, P., Wang, Z., Liu, Y., Li, X., Fu, Q., Yan, Y., 2015. Noise robust IOA/CAS speech separation and recognition system for the third ’CHIME’ challenge. ArXiv:
– start-page: 1045
  year: 2010
  end-page: 1048
  ident: bib0045
  article-title: Recurrent neural network based language model
  publication-title: Proceedings of the Interspeech
– year: 1994
  ident: bib0038
  article-title: The translingual English database (TED)
  publication-title: Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP)
– start-page: 100
  year: 2015
  end-page: 107
  ident: bib0034
  article-title: Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures
  publication-title: Proceedings of the 12th Int. Conf. on Latent Variable Analysis and Signal Separation (LVA/ICA)
– reference: Shinoda, K., 2011. Speaker adaptation techniques for automatic speech recognition. Proceedings of the APSIPA ASC 2011.
– start-page: 347
  year: 1997
  end-page: 354
  ident: bib0018
  article-title: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER)
  publication-title: Proceedings of the 1997 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 49
  start-page: 1614
  year: 2001
  end-page: 1626
  ident: bib0022
  article-title: Signal enhancement using beamforming and nonstationarity with applications to speech
  publication-title: IEEE Trans. Signal Process.
– start-page: 157
  year: 2001
  end-page: 180
  ident: bib0015
  article-title: Robust localization in reverberant rooms
  publication-title: Microphone Arrays: Signal Processing Techniques and Applications
– start-page: 83
  year: 2015
  end-page: 90
  ident: bib0012
  article-title: Noise perturbation improves supervised speech separation
  publication-title: Proceedings of the 12th International Conference on Latent Variable Analysis and Signal Separation (LVA/ICA)
– start-page: 468
  year: 2015
  end-page: 474
  ident: bib0048
  article-title: A CHiME-3 challenge system: long-term acoustic features for noise robust automatic speech recognition
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– start-page: 444
  year: 2015
  end-page: 451
  ident: bib0027
  article-title: BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge
  publication-title: Proceedings of the 2015 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– volume: 7/8
  start-page: 439
  year: 1994
  end-page: 446
  ident: bib0059
  article-title: Suppression of coherent and incoherent noise using a microphone array
  publication-title: Ann. Telecommun.
– start-page: 1
  year: 2012
  ident: 10.1016/j.csl.2016.11.005_bib0037
  article-title: Microphone array processing for distant speech recognition: Towards real-world deployment
– volume: 18
  start-page: 382
  issue: 2
  year: 2010
  ident: 10.1016/j.csl.2016.11.005_bib0042
  article-title: Model-based expectation maximization source separation and localization
  publication-title: IEEE Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASL.2009.2029711
– volume: 49
  start-page: 1614
  issue: 8
  year: 2001
  ident: 10.1016/j.csl.2016.11.005_bib0022
  article-title: Signal enhancement using beamforming and nonstationarity with applications to speech
  publication-title: IEEE Trans. Signal Process.
  doi: 10.1109/78.934132
– start-page: 468
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0048
  article-title: A CHiME-3 challenge system: long-term acoustic features for noise robust automatic speech recognition
– start-page: 444
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0027
  article-title: BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge
– start-page: 1
  year: 2013
  ident: 10.1016/j.csl.2016.11.005_bib0035
  article-title: The REVERB challenge: a common evaluation framework for dereverberation and recognition of reverberant speech
– volume: 15
  start-page: 617
  issue: 2
  year: 2007
  ident: 10.1016/j.csl.2016.11.005_bib0016
  article-title: Superdirective beamforming robust against microphone mismatch
  publication-title: IEEE Trans. Acoust. Speech Signal Process.
– volume: 27
  start-page: 763
  issue: 3
  year: 2013
  ident: 10.1016/j.csl.2016.11.005_bib0030
  article-title: Modelling non-stationary noise with spectral factorisation in automatic speech recognition
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2012.07.008
– start-page: 459
  year: 2003
  ident: 10.1016/j.csl.2016.11.005_bib0044
  article-title: On diagonal loading for minimum variance beamformers
– start-page: 687
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0010
  article-title: The MGB challenge: Evaluating multi-genre broadcast media recognition
– volume: 23
  start-page: 2189
  issue: 12
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0001
  article-title: Spatially robust far-field beamforming using the von Mises(-Fisher) distribution
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2015.2473684
– start-page: 886
  year: 2013
  ident: 10.1016/j.csl.2016.11.005_bib0046
  article-title: Damped oscillator cepstral coefficients for robust speech recognition
– start-page: 171
  year: 2014
  ident: 10.1016/j.csl.2016.11.005_bib0063
  article-title: Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
– year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0039
– start-page: 7398
  year: 2013
  ident: 10.1016/j.csl.2016.11.005_bib0057
  article-title: An investigation of deep neural networks for noise robust speech recognition
– volume: 7/8
  start-page: 439
  year: 1994
  ident: 10.1016/j.csl.2016.11.005_bib0059
  article-title: Suppression of coherent and incoherent noise using a microphone array
  publication-title: Ann. Telecommun.
  doi: 10.1007/BF02999432
– start-page: 152
  year: 2011
  ident: 10.1016/j.csl.2016.11.005_bib0032
  article-title: ivector-based discriminative adaptation for automatic speech recognition
– year: 2015
  ident: 10.1016/j.csl.2016.11.005_sbref0041
– start-page: 532
  year: 1989
  ident: 10.1016/j.csl.2016.11.005_bib0024
  article-title: Some statistical issues in the comparison of speech recognition algorithms
– year: 2016
  ident: 10.1016/j.csl.2016.11.005_bib0050
  article-title: Multichannel music separation with deep neural networks
– year: 2016
  ident: 10.1016/j.csl.2016.11.005_sbref0007
  article-title: The third ‘CHiME’ speech separation and recognition challenge: analysis and outcomes
  publication-title: Comput. Speech Lang
– start-page: 1045
  year: 2010
  ident: 10.1016/j.csl.2016.11.005_bib0045
  article-title: Recurrent neural network based language model
– start-page: 6
  year: 2014
  ident: 10.1016/j.csl.2016.11.005_bib0056
  article-title: Unbiased coherent-to-diffuse ratio estimation for dereverberation
– year: 2009
  ident: 10.1016/j.csl.2016.11.005_bib0071
– start-page: 423
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0067
  article-title: Speech enhancement using beamforming and non negative matrix factorization for robust speech recognition in the CHiME-3 challenge
– year: 2007
  ident: 10.1016/j.csl.2016.11.005_bib0023
– start-page: 100
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0034
  article-title: Adaptive denoising autoencoders: a fine-tuning scheme to learn from test mixtures
– volume: 21
  start-page: 65
  issue: 1
  year: 2014
  ident: 10.1016/j.csl.2016.11.005_bib0072
  article-title: An experimental study on speech enhancement based on deep neural networks
  publication-title: IEEE Signal Process. Lett.
  doi: 10.1109/LSP.2013.2291240
– year: 2012
  ident: 10.1016/j.csl.2016.11.005_bib0066
– start-page: 496
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0004
  article-title: Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition
– volume: 76
  start-page: 170
  year: 2016
  ident: 10.1016/j.csl.2016.11.005_bib0011
  article-title: On the relationship between early-to-late ratio of room impulse responses and ASR performance in reverberant environments
  publication-title: Speech Commun.
  doi: 10.1016/j.specom.2015.09.004
– start-page: 181
  year: 2000
  ident: 10.1016/j.csl.2016.11.005_bib0028
  article-title: The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
– start-page: 482
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0060
  article-title: Robust ASR using neural network based speech enhancement and feature simulation
– year: 2011
  ident: 10.1016/j.csl.2016.11.005_bib0052
  article-title: The kaldi speech recognition toolkit
– volume: 22
  start-page: 1849
  issue: 12
  year: 2014
  ident: 10.1016/j.csl.2016.11.005_bib0069
  article-title: On training targets for supervised speech separation
  publication-title: IEEE/ACM Trans. Audio, Speech Lang. Process.
  doi: 10.1109/TASLP.2014.2352935
– start-page: 2180
  year: 2014
  ident: 10.1016/j.csl.2016.11.005_bib0033
  article-title: Adaptation of deep neural network acoustic models using factorised i-vectors
– start-page: 1116
  year: 2013
  ident: 10.1016/j.csl.2016.11.005_bib0019
  article-title: The Sheffield wargames corpus
– start-page: 436
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0073
  article-title: The NTT CHiME-3 system: advances in speech enhancement and recognition for mobile multi-microphone devices
– start-page: 347
  year: 1997
  ident: 10.1016/j.csl.2016.11.005_bib0018
  article-title: A post-processing system to yield reduced word error rates: recognizer output voting error reduction (ROVER)
– start-page: 475
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0029
  article-title: The MERL/SRI system for the 3rd CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition
– start-page: 547
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0026
  article-title: The automatic speech recognition in reverberant environments (ASpIRE) challenge
– volume: vol. 5
  start-page: 2578
  year: 1988
  ident: 10.1016/j.csl.2016.11.005_bib0075
  article-title: A microphone array with adaptive post-filtering for noise reduction in reverberant rooms
– volume: 26
  start-page: 75
  issue: 3
  year: 2009
  ident: 10.1016/j.csl.2016.11.005_bib0005
  article-title: Research developments and directions in speech recognition and understanding, part 1
  publication-title: IEEE Signal Process. Mag.
  doi: 10.1109/MSP.2009.932166
– volume: 15
  start-page: 2011
  issue: 7
  year: 2007
  ident: 10.1016/j.csl.2016.11.005_bib0002
  article-title: Acoustic beamforming for speaker diarization of meetings
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASL.2007.902460
– volume: 24
  start-page: 1652
  year: 2016
  ident: 10.1016/j.csl.2016.11.005_bib0049
  article-title: Multichannel audio source separation with deep neural networks
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2016.2580946
– year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0064
  article-title: The Overview of the MELCO ASR System for the Third CHiME Challenge
– start-page: 504
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0007
  article-title: The third ‘CHiME’ speech separation and recognition challenge: dataset, task and baselines
– volume: vol.1
  start-page: 181
  year: 1995
  ident: 10.1016/j.csl.2016.11.005_bib0036
  article-title: Improved backing-off for m-gram language modeling
– volume: 26
  start-page: 52
  issue: 1
  year: 2011
  ident: 10.1016/j.csl.2016.11.005_bib0062
  article-title: The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2010.12.003
– ident: 10.1016/j.csl.2016.11.005_bib0051
– start-page: 416
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0020
  article-title: Unified ASR system using LGM-based source separation, noise-robust feature extraction, and word hypothesis selection
– start-page: 275
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0054
  article-title: The DIRHA-English corpus and related tasks for distant-speech recognition in domestic environments
– start-page: 309
  year: 2013
  ident: 10.1016/j.csl.2016.11.005_bib0031
  article-title: Elastic spectral distortion for low resource speech recognition with deep neural networks
– volume: 9
  start-page: 310
  issue: 6
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0061
  article-title: Improvement of microphone array characteristics for speech capturing
  publication-title: Mod. Appl. Sci.
– ident: 10.1016/j.csl.2016.11.005_bib0068
– start-page: 83
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0012
  article-title: Noise perturbation improves supervised speech separation
– start-page: 76
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0041
  article-title: Scalable audio separation with light kernel additive modelling
– year: 2010
  ident: 10.1016/j.csl.2016.11.005_bib0013
– volume: 27
  start-page: 621
  issue: 3
  year: 2013
  ident: 10.1016/j.csl.2016.11.005_bib0009
  article-title: The PASCAL CHiME speech separation and recognition challenge
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2012.10.004
– start-page: 2023
  year: 2001
  ident: 10.1016/j.csl.2016.11.005_bib0025
  article-title: “CU-Move”: analysis & corpus development for interactive in-vehicle speech systems
– volume: 87
  issue: 8
  year: 2007
  ident: 10.1016/j.csl.2016.11.005_sbref0061
  article-title: Oracle estimators for the benchmarking of source separation algorithms
  publication-title: Signal Process.
  doi: 10.1016/j.sigpro.2007.01.016
– ident: 10.1016/j.csl.2016.11.005_bib0058
– start-page: 1749
  year: 2014
  ident: 10.1016/j.csl.2016.11.005_bib0047
  article-title: Medium duration modulation cepstral feature for robust speech recognition
– volume: 12
  start-page: 75
  issue: 2
  year: 1998
  ident: 10.1016/j.csl.2016.11.005_bib0021
  article-title: Maximum likelihood linear transformations for HMM-based speech recognition
  publication-title: Comput. Speech Lang.
  doi: 10.1006/csla.1998.0043
– volume: 19
  start-page: 69
  issue: 1
  year: 2010
  ident: 10.1016/j.csl.2016.11.005_bib0074
  article-title: Blind separation and dereverberation of speech mixtures by joint optimization
  publication-title: IEEE Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASL.2010.2045183
– volume: 35
  start-page: 1365
  issue: 10
  year: 1987
  ident: 10.1016/j.csl.2016.11.005_bib0014
  article-title: Robust adaptive beamforming
  publication-title: IEEE Trans. Acoust. Speech Signal Process.
  doi: 10.1109/TASSP.1987.1165054
– start-page: 157
  year: 2001
  ident: 10.1016/j.csl.2016.11.005_bib0015
  article-title: Robust localization in reverberant rooms
– volume: 11
  start-page: 1157
  year: 2003
  ident: 10.1016/j.csl.2016.11.005_bib0003
  article-title: Equivalence between frequency-domain blind source separation and frequency-domain adaptive beamforming for convolutive mixtures
  publication-title: EURASIP J. Appl. Signal Process.
  doi: 10.1155/S1110865703305074
– start-page: 91
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0070
  article-title: Speech enhancement with LSTM recurrent neural networks and its application to noise-robust ASR
– start-page: 401
  year: 2015
  ident: 10.1016/j.csl.2016.11.005_bib0053
  article-title: Adaptive beamforming and adaptive training of DNN acoustic models for enhanced multichannel noisy speech recognition
– start-page: 115
  year: 2008
  ident: 10.1016/j.csl.2016.11.005_bib0055
  article-title: Interpretation of multiparty meetings: the AMI and AMIDA projects
– ident: 10.1016/j.csl.2016.11.005_bib0006
– volume: 18
  start-page: 1830
  issue: 7
  year: 2010
  ident: 10.1016/j.csl.2016.11.005_bib0017
  article-title: Under-determined reverberant audio source separation using a full-rank spatial covariance model
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASL.2010.2050716
– ident: 10.1016/j.csl.2016.11.005_bib0040
– year: 1994
  ident: 10.1016/j.csl.2016.11.005_bib0038
  article-title: The translingual English database (TED)
SSID ssj0006547
Score 2.587339
Snippet •An analysis of the impact of acoustic mismatches between training and test data on the performance of robust ASR.•Including: environment, microphone and data...
Speech enhancement and automatic speech recognition (ASR) are most often evaluated in matched (or multi-condition) settings where the acoustic conditions of...
SourceID hal
crossref
elsevier
SourceType Open Access Repository
Enrichment Source
Index Database
Publisher
StartPage 535
SubjectTerms Computer Science
Microphone array
Robust ASR
Signal and Image Processing
Speech enhancement
Train/test mismatch
Title An analysis of environment, microphone and data simulation mismatches in robust speech recognition
URI https://dx.doi.org/10.1016/j.csl.2016.11.005
https://inria.hal.science/hal-01399180
Volume 46
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEG4QL3rwgRrxQRrjybiwy3YfPRIiwRcXJeG26XbbuEYWwoJHf7szSxcxMRy8dtummc50vm2_mSHkOuQ6DJTLLSVBg5nSzBKhCixfaVcGQoETw9jh54HfH7KHkTeqkG4ZC4O0SnP2L8_04rQ2LS0jzdY0TVsvYB94pRkCokBgUESwswC1vPn1Q_PA4rpLJOlZ2Lt82Sw4XjLH1wfHb2IiT6xg97dv2norb1kLr9M7IHsGLtLOckWHpKKyGtkvSzFQY5k1sruWV_CIxJ2MCpNthE40XYtmu6VjpOAhI11Bn4QiRZTm6diU8YLPOYBYLKNF04zOJvEin9N8qpR8oyu20SQ7JsPe3Wu3b5liCpZkzJ5bgktfaB0qwVxpY8yB5yQCLDrkjghlEAtXeELFPtNaSt8BO24nnpfYMhFKS9s9IdUMlnZKqOAASxh33QTQi8ZU5e0g0K4vFVNcc79O7FKMkTSZxrHgxUdUUsreI5B8hJKHP5AIJF8nN6sh02WajU2dWbk30S9dicANbBp2Bfu4mh7zavc7TxG2IQ7mTmh_Omf_m_uc7LTR3xdBihekOp8t1CWglXncKNSxQbY794_9wTecfeoC
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LTxRBEK7wOKgHQdSIIHaIXozDzqPn0QcPG4EssHAREm5tT091WCOzG2bReOFP-Qetmu1ZMTEcSLj2THc61dVVX3d_VQXwrlCuyDFRAVrSYIlOBqbAPMjQJTY3SE6MY4ePT7LBmTw8T88X4HcXC8O0Sm_7Zza9tda-peel2ZuMRr0vtD_4SrMgRMHAIPLMyiP89ZPObc2ng11a5PdxvL93-nkQ-NICgZUynAZG2cw4V6CRiQ2ZgZ9GlSH9LlRkCpuXJjGpwTKTzlmbRaTVcZWmVWgrg86GCY27CMuSzAWXTdi5-csr4Wq-M-iaBjy97im1JZXZhp87omyHM4dyybz_O8PFi-5at3Vz-6vw1ONT0Z-J4BksYL0GK13tB-FNwRo8uZXI8DmU_VoYn95EjJ24FT73UVwy548p8Ej_VII5qaIZXfq6YfS5IdTMdbvEqBZX4_K6mYpmgmgvxJzeNK5fwNmDiPglLNU0tVcgjCIcJFWSVASXHOdGj_PcJZlFicqpbB3CToza-tTmXGHju-44bN80SV6z5OnIo0ny6_Bh3mUyy-tx18-yWxv9j3Jq8jt3ddumdZwPz4m8B_2h5jYG3ioqwh_R6_uN_RYeDU6Ph3p4cHK0AY9jBhtthOQmLE2vrvENQaVpudWqpoCvD70X_gBSsieJ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=An+analysis+of+environment%2C+microphone+and+data+simulation+mismatches+in+robust+speech+recognition&rft.jtitle=Computer+speech+%26+language&rft.au=Vincent%2C+Emmanuel&rft.au=Watanabe%2C+Shinji&rft.au=Nugraha%2C+Aditya+Arie&rft.au=Barker%2C+Jon&rft.date=2017-11-01&rft.pub=Elsevier+Ltd&rft.issn=0885-2308&rft.eissn=1095-8363&rft.volume=46&rft.spage=535&rft.epage=557&rft_id=info:doi/10.1016%2Fj.csl.2016.11.005&rft.externalDocID=S0885230816301231
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0885-2308&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0885-2308&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0885-2308&client=summon