Statistical Testing on ASR Performance via Blockwise Bootstrap

A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence interval (CI) can be utilized to tell whether this improvement is real or o...

Full description

Saved in:
Bibliographic Details
Main Authors Liu, Zhe, Peng, Fuchun
Format Journal Article
LanguageEnglish
Published 19.12.2019
Subjects
Online AccessGet full text

Cover

Loading…
Abstract A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence interval (CI) can be utilized to tell whether this improvement is real or only due to random chance. The bootstrap resampling method has been popular for such significance analysis which is intuitive and easy to use. However, this method fails in dealing with dependent data, which is prevalent in speech world - for example, ASR performance on utterances from the same speaker could be correlated. In this paper we present blockwise bootstrap approach - by dividing evaluation utterances into nonoverlapping blocks, this method resamples these blocks instead of original data. We show that the resulting variance estimator of absolute WER difference between two ASR systems is consistent under mild conditions. We also demonstrate the validity of blockwise bootstrap method on both synthetic and real-world speech data.
AbstractList A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence interval (CI) can be utilized to tell whether this improvement is real or only due to random chance. The bootstrap resampling method has been popular for such significance analysis which is intuitive and easy to use. However, this method fails in dealing with dependent data, which is prevalent in speech world - for example, ASR performance on utterances from the same speaker could be correlated. In this paper we present blockwise bootstrap approach - by dividing evaluation utterances into nonoverlapping blocks, this method resamples these blocks instead of original data. We show that the resulting variance estimator of absolute WER difference between two ASR systems is consistent under mild conditions. We also demonstrate the validity of blockwise bootstrap method on both synthetic and real-world speech data.
Author Peng, Fuchun
Liu, Zhe
Author_xml – sequence: 1
  givenname: Zhe
  surname: Liu
  fullname: Liu, Zhe
– sequence: 2
  givenname: Fuchun
  surname: Peng
  fullname: Peng, Fuchun
BackLink https://doi.org/10.48550/arXiv.1912.09508$$DView paper in arXiv
BookMark eNotz8FOwzAQBFAf6AFaPoAT_oGEdRIn8QWprYAiVaJqc482zhpZpHblWIX-PW3hNHOZkd4du3HeEWMPAtKilhKeMPzYYyqUyFJQEupb9ryLGO0YrcaBN3Qu7pN7x-e7Ld9QMD7s0WniR4t8MXj99W1H4gvv4xgDHmZsYnAY6f4_p6x5fWmWq2T98fa-nK8TLKs6wbwCEEWuTGbQGNSoqdcdlLpXnRJYdChkBRqEKkgYmZmuQlWet7IHNJRP2ePf7RXQHoLdYzi1F0h7heS_RQlF3g
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
EPD
GOX
DOI 10.48550/arxiv.1912.09508
DatabaseName arXiv Computer Science
arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 1912_09508
GroupedDBID AKY
EPD
GOX
ID FETCH-LOGICAL-a678-a37001439f2faffacacedcb06cd9b91a4ba1570c0194e1f52fb7a96a675d0afe3
IEDL.DBID GOX
IngestDate Mon Jan 08 05:49:25 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a678-a37001439f2faffacacedcb06cd9b91a4ba1570c0194e1f52fb7a96a675d0afe3
OpenAccessLink https://arxiv.org/abs/1912.09508
ParticipantIDs arxiv_primary_1912_09508
PublicationCentury 2000
PublicationDate 2019-12-19
PublicationDateYYYYMMDD 2019-12-19
PublicationDate_xml – month: 12
  year: 2019
  text: 2019-12-19
  day: 19
PublicationDecade 2010
PublicationYear 2019
Score 1.7581799
SecondaryResourceType preprint
Snippet A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Statistics - Machine Learning
Title Statistical Testing on ASR Performance via Blockwise Bootstrap
URI https://arxiv.org/abs/1912.09508
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7anryIolK1Sg5eV3ez2ddFaMVaBB9ohd6WyUsWYbe0tfrznWQr9eIlh0xyyATyzWRmvgG40MbVoCU8iHVSkIMS2SBXKQ1CY-bYWLjvPPfwmE7exP0smXWA_dbC4OK7Wrf8wHJ5Rc4Evwxdo9IudDl3KVt3T7M2OOmpuDbrt-vIxvRTf0BivAe7G-uODdvr2IeOqQ_g2hl0ng-ZRFPHalG_s6Zmw9cX9rzN22frCtmIsOXjq1oaNmqalfuGmB_CdHw7vZkEm7YFAdLLH2DsQrmE85ZbtBYVKqOVDFOlC1lEKCRGSRYqsq2EiWzCrcywSGlvokO0Jj6CHnn-pg8spidAa0JwSSCck28hbGwVFiLXoRVZfAx9f9hy3jJTlE4PpdfDyf-iU9gh1PdNEKJiAL3V4tOcEbKu5LlX7w_lOnii
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Statistical+Testing+on+ASR+Performance+via+Blockwise+Bootstrap&rft.au=Liu%2C+Zhe&rft.au=Peng%2C+Fuchun&rft.date=2019-12-19&rft_id=info:doi/10.48550%2Farxiv.1912.09508&rft.externalDocID=1912_09508