Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results

Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particula...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) pp. 452 - 459
Main Authors Pfeifenberger, Lukas, Schrank, Tobias, Zohrer, Matthias, Hagmuller, Martin, Pernkopf, Franz
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2015
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively.
AbstractList Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively.
Author Pfeifenberger, Lukas
Pernkopf, Franz
Zohrer, Matthias
Hagmuller, Martin
Schrank, Tobias
Author_xml – sequence: 1
  givenname: Lukas
  surname: Pfeifenberger
  fullname: Pfeifenberger, Lukas
  email: lukas.pfeifenberger@alumni.tugraz.at
  organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria
– sequence: 2
  givenname: Tobias
  surname: Schrank
  fullname: Schrank, Tobias
  email: tobias.schrank@tugraz.at
  organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria
– sequence: 3
  givenname: Matthias
  surname: Zohrer
  fullname: Zohrer, Matthias
  email: matthias.zoehrer@tugraz.at
  organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria
– sequence: 4
  givenname: Martin
  surname: Hagmuller
  fullname: Hagmuller, Martin
  email: hagmueller@tugraz.at
  organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria
– sequence: 5
  givenname: Franz
  surname: Pernkopf
  fullname: Pernkopf, Franz
  email: pernkopf@tugraz.at
  organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria
BookMark eNo1kEFLwzAcxSPowU0_gHjJF-hMmiZpvI0ynbAhqDuPJP2nDdSkJO3Bb2_BeXqX3_s9eCt0HWIAhB4o2VBK1NP28-O0KQnlG1mRqmbkCq1oJZWSpaLiFoXjPEy-sL0OAQacRwDb4zFFCzn70GGdbO8nsNOcIGMXEw7RZ8ApmjlP_4UENnbBTz6GZ8xSi5u9P-7woh0GCN2CQ16G8h26cXrIcH_JNTq97L6afXF4f31rtofCUyFIUVeclbQVjHPFiVRO89LwUrSc2NpRRp1rVQ3GSWaMZFa0umWUMwaV0a40bI0e_7weAM5j8t86_ZwvF7BfI6JXQQ
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ASRU.2015.7404830
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEL
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEL
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1479972916
9781479972913
EndPage 459
ExternalDocumentID 7404830
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i1660-845321d635595079fa52b526d50c8f131ffd98ebf73bb73c6dad31533e4baf2b3
IEDL.DBID RIE
IngestDate Thu Jun 29 18:36:07 EDT 2023
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i1660-845321d635595079fa52b526d50c8f131ffd98ebf73bb73c6dad31533e4baf2b3
OpenAccessLink https://www.spsc.tugraz.at/sites/default/files/pfeifenberger-2.pdf
PageCount 8
ParticipantIDs ieee_primary_7404830
PublicationCentury 2000
PublicationDate 20151201
PublicationDateYYYYMMDD 2015-12-01
PublicationDate_xml – month: 12
  year: 2015
  text: 20151201
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
PublicationTitleAbbrev ASRU
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.6848202
Snippet Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as...
SourceID ieee
SourceType Publisher
StartPage 452
SubjectTerms Array signal processing
Artificial neural networks
automatic speech recognition
deep postfilter
Microphones
multi-channel speech processing
Speech
Speech enhancement
Speech recognition
Title Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results
URI https://ieeexplore.ieee.org/document/7404830
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJyZALeItD4w4bWI7idlQ1apCKkJApW6VH2dRgZKqSRZ-PXaSlocY2CzrLFt3w935vu8OoetEDqmlRhPJjCAMqCSKyYiAdjkzo8byevLc7CGeztn9gi866GbHhQGAGnwGgV_WtXyT68p_lQ0S5huguwR9LxGi4Wq1hcpwKAZ3z09zj9XiQSv3Y2BK7S8mB2i2vamBibwFVakC_fGrCeN_n3KI-l_MPPy48zlHqANZD2U1i5Z4Dm8G77hYA-hXvG4oAE4Mf68WFNiFqTjLVwXgTa6qotwe2GGJ8uwW043Bo-lqNsZ6O27FCRTuoqKP5pPxy2hK2jkKZBXG8ZCkjNMoND60EC78E1bySPEoNnyoUxvS0FojUlA2oUolVMdGGurjQGBK2kjRY9TN8gxOEDYxsyLRKaOMMZVS6ZSgmOVcxzJkJjpFPa-r5bpplbFs1XT29_Y52vf2atAhF6hbbiq4dD6-VFe1cT8BlD6qww
link.rule.ids 310,311,783,787,792,793,799,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELaqMsAEqEW88cCI0ya282BDVasATYWglbpVfooKlFRNsvDrsZO0PMTAZlln2ToP99n3fXcAXAesjzWWAjEiI0QUZogT5iElzJuZYKlp1XkumfjxjDzM6bwFbrZaGKVURT5Tjh1WuXyZidJ-lfUCYgugmwf6jsHVoV-rtZpUpduPencvzzPL1qJOY_mjZUoVMUb7INnsVRNF3pyy4I74-FWG8b-HOQDdL20efNpGnUPQUmkHpJWOFlkVb6reYb5SSrzCVS0CMGbwe74ghwaowjRb5gquM17mxWbBlk2UpbcQryUcxMtkCMWm4YoxyM1GeRfMRsPpIEZNJwW0dH2_j0JCsedKCy4iAwAjzajHqedL2hehdrGrtYxCxXWAOQ-w8CWT2CJBRTjTHsdHoJ1mqToGUPpER4EICSaE8BAz4wRONKXCZy6R3gnoWF8tVnWxjEXjptO_p6_AbjxNxovx_eTxDOzZu6u5IuegXaxLdWEifsEvq4v-BKzZrg4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+Workshop+on+Automatic+Speech+Recognition+and+Understanding+%28ASRU%29&rft.atitle=Multi-channel+speech+processing+architectures+for+noise+robust+speech+recognition%3A+3rd+CHiME+challenge+results&rft.au=Pfeifenberger%2C+Lukas&rft.au=Schrank%2C+Tobias&rft.au=Zohrer%2C+Matthias&rft.au=Hagmuller%2C+Martin&rft.date=2015-12-01&rft.pub=IEEE&rft.spage=452&rft.epage=459&rft_id=info:doi/10.1109%2FASRU.2015.7404830&rft.externalDocID=7404830