Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results
Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particula...
Saved in:
Published in | 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) pp. 452 - 459 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively. |
---|---|
AbstractList | Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as street, bus, caffee and pedestrian areas. We study variants of beamformers used for pre-processing multi-channel speech recordings. In particular, we investigate three variants of generalized side-lobe canceller (GSC) beamformers, i.e. GSC with sparse blocking matrix (BM), GSC with adaptive BM (ABM), and GSC with minimum variance distortionless response (MVDR) and ABM. Furthermore, we apply several post-filters to further enhance the speech signal. We introduce MaxPower postfilters and deep neural postfilters (DPFs). DPFs outperformed our baseline systems significantly when measuring the overall perceptual score (OPS) and the perceptual evaluation of speech quality (PESQ). In particular DPFs achieved an average relative improvement of 17.54% OPS points and 18.28% in PESQ, when compared to the CHiME 3 baseline. DPFs also achieved the best WER when combined with an ASR engine on simulated development and evaluation data, i.e. 8.98% and 10.82% WER. The proposed MaxPower beamformer achieved the best overall WER on CHiME 3 real development and evaluation data, i.e. 14.23% and 22.12%, respectively. |
Author | Pfeifenberger, Lukas Pernkopf, Franz Zohrer, Matthias Hagmuller, Martin Schrank, Tobias |
Author_xml | – sequence: 1 givenname: Lukas surname: Pfeifenberger fullname: Pfeifenberger, Lukas email: lukas.pfeifenberger@alumni.tugraz.at organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria – sequence: 2 givenname: Tobias surname: Schrank fullname: Schrank, Tobias email: tobias.schrank@tugraz.at organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria – sequence: 3 givenname: Matthias surname: Zohrer fullname: Zohrer, Matthias email: matthias.zoehrer@tugraz.at organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria – sequence: 4 givenname: Martin surname: Hagmuller fullname: Hagmuller, Martin email: hagmueller@tugraz.at organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria – sequence: 5 givenname: Franz surname: Pernkopf fullname: Pernkopf, Franz email: pernkopf@tugraz.at organization: Signal Process. & Speech Commun. Lab., Graz Univ. of Technol., Graz, Austria |
BookMark | eNo1kEFLwzAcxSPowU0_gHjJF-hMmiZpvI0ynbAhqDuPJP2nDdSkJO3Bb2_BeXqX3_s9eCt0HWIAhB4o2VBK1NP28-O0KQnlG1mRqmbkCq1oJZWSpaLiFoXjPEy-sL0OAQacRwDb4zFFCzn70GGdbO8nsNOcIGMXEw7RZ8ApmjlP_4UENnbBTz6GZ8xSi5u9P-7woh0GCN2CQ16G8h26cXrIcH_JNTq97L6afXF4f31rtofCUyFIUVeclbQVjHPFiVRO89LwUrSc2NpRRp1rVQ3GSWaMZFa0umWUMwaV0a40bI0e_7weAM5j8t86_ZwvF7BfI6JXQQ |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ASRU.2015.7404830 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEL IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEL url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 1479972916 9781479972913 |
EndPage | 459 |
ExternalDocumentID | 7404830 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i1660-845321d635595079fa52b526d50c8f131ffd98ebf73bb73c6dad31533e4baf2b3 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:36:07 EDT 2023 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i1660-845321d635595079fa52b526d50c8f131ffd98ebf73bb73c6dad31533e4baf2b3 |
OpenAccessLink | https://www.spsc.tugraz.at/sites/default/files/pfeifenberger-2.pdf |
PageCount | 8 |
ParticipantIDs | ieee_primary_7404830 |
PublicationCentury | 2000 |
PublicationDate | 20151201 |
PublicationDateYYYYMMDD | 2015-12-01 |
PublicationDate_xml | – month: 12 year: 2015 text: 20151201 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU) |
PublicationTitleAbbrev | ASRU |
PublicationYear | 2015 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.6848202 |
Snippet | Recognizing speech under noisy condition is an ill-posed problem. The CHiME 3 challenge targets robust speech recognition in realistic environments such as... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 452 |
SubjectTerms | Array signal processing Artificial neural networks automatic speech recognition deep postfilter Microphones multi-channel speech processing Speech Speech enhancement Speech recognition |
Title | Multi-channel speech processing architectures for noise robust speech recognition: 3rd CHiME challenge results |
URI | https://ieeexplore.ieee.org/document/7404830 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKJyZALeItD4w4bWI7idlQ1apCKkJApW6VH2dRgZKqSRZ-PXaSlocY2CzrLFt3w935vu8OoetEDqmlRhPJjCAMqCSKyYiAdjkzo8byevLc7CGeztn9gi866GbHhQGAGnwGgV_WtXyT68p_lQ0S5huguwR9LxGi4Wq1hcpwKAZ3z09zj9XiQSv3Y2BK7S8mB2i2vamBibwFVakC_fGrCeN_n3KI-l_MPPy48zlHqANZD2U1i5Z4Dm8G77hYA-hXvG4oAE4Mf68WFNiFqTjLVwXgTa6qotwe2GGJ8uwW043Bo-lqNsZ6O27FCRTuoqKP5pPxy2hK2jkKZBXG8ZCkjNMoND60EC78E1bySPEoNnyoUxvS0FojUlA2oUolVMdGGurjQGBK2kjRY9TN8gxOEDYxsyLRKaOMMZVS6ZSgmOVcxzJkJjpFPa-r5bpplbFs1XT29_Y52vf2atAhF6hbbiq4dD6-VFe1cT8BlD6qww |
link.rule.ids | 310,311,783,787,792,793,799,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELaqMsAEqEW88cCI0ya282BDVasATYWglbpVfooKlFRNsvDrsZO0PMTAZlln2ToP99n3fXcAXAesjzWWAjEiI0QUZogT5iElzJuZYKlp1XkumfjxjDzM6bwFbrZaGKVURT5Tjh1WuXyZidJ-lfUCYgugmwf6jsHVoV-rtZpUpduPencvzzPL1qJOY_mjZUoVMUb7INnsVRNF3pyy4I74-FWG8b-HOQDdL20efNpGnUPQUmkHpJWOFlkVb6reYb5SSrzCVS0CMGbwe74ghwaowjRb5gquM17mxWbBlk2UpbcQryUcxMtkCMWm4YoxyM1GeRfMRsPpIEZNJwW0dH2_j0JCsedKCy4iAwAjzajHqedL2hehdrGrtYxCxXWAOQ-w8CWT2CJBRTjTHsdHoJ1mqToGUPpER4EICSaE8BAz4wRONKXCZy6R3gnoWF8tVnWxjEXjptO_p6_AbjxNxovx_eTxDOzZu6u5IuegXaxLdWEifsEvq4v-BKzZrg4 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+Workshop+on+Automatic+Speech+Recognition+and+Understanding+%28ASRU%29&rft.atitle=Multi-channel+speech+processing+architectures+for+noise+robust+speech+recognition%3A+3rd+CHiME+challenge+results&rft.au=Pfeifenberger%2C+Lukas&rft.au=Schrank%2C+Tobias&rft.au=Zohrer%2C+Matthias&rft.au=Hagmuller%2C+Martin&rft.date=2015-12-01&rft.pub=IEEE&rft.spage=452&rft.epage=459&rft_id=info:doi/10.1109%2FASRU.2015.7404830&rft.externalDocID=7404830 |