Statistical Testing on ASR Performance via Blockwise Bootstrap
A common question being raised in automatic speech recognition (ASR) evaluations is how reliable is an observed word error rate (WER) improvement comparing two ASR systems, where statistical hypothesis testing and confidence interval (CI) can be utilized to tell whether this improvement is real or o...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
19.12.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | A common question being raised in automatic speech recognition (ASR)
evaluations is how reliable is an observed word error rate (WER) improvement
comparing two ASR systems, where statistical hypothesis testing and confidence
interval (CI) can be utilized to tell whether this improvement is real or only
due to random chance. The bootstrap resampling method has been popular for such
significance analysis which is intuitive and easy to use. However, this method
fails in dealing with dependent data, which is prevalent in speech world - for
example, ASR performance on utterances from the same speaker could be
correlated. In this paper we present blockwise bootstrap approach - by dividing
evaluation utterances into nonoverlapping blocks, this method resamples these
blocks instead of original data. We show that the resulting variance estimator
of absolute WER difference between two ASR systems is consistent under mild
conditions. We also demonstrate the validity of blockwise bootstrap method on
both synthetic and real-world speech data. |
---|---|
AbstractList | A common question being raised in automatic speech recognition (ASR)
evaluations is how reliable is an observed word error rate (WER) improvement
comparing two ASR systems, where statistical hypothesis testing and confidence
interval (CI) can be utilized to tell whether this improvement is real or only
due to random chance. The bootstrap resampling method has been popular for such
significance analysis which is intuitive and easy to use. However, this method
fails in dealing with dependent data, which is prevalent in speech world - for
example, ASR performance on utterances from the same speaker could be
correlated. In this paper we present blockwise bootstrap approach - by dividing
evaluation utterances into nonoverlapping blocks, this method resamples these
blocks instead of original data. We show that the resulting variance estimator
of absolute WER difference between two ASR systems is consistent under mild
conditions. We also demonstrate the validity of blockwise bootstrap method on
both synthetic and real-world speech data. |
Author | Peng, Fuchun Liu, Zhe |
Author_xml | – sequence: 1 givenname: Zhe surname: Liu fullname: Liu, Zhe – sequence: 2 givenname: Fuchun surname: Peng fullname: Peng, Fuchun |
BackLink | https://doi.org/10.48550/arXiv.1912.09508$$DView paper in arXiv |
BookMark | eNotz8FOwzAQBFAf6AFaPoAT_oGEdRIn8QWprYAiVaJqc482zhpZpHblWIX-PW3hNHOZkd4du3HeEWMPAtKilhKeMPzYYyqUyFJQEupb9ryLGO0YrcaBN3Qu7pN7x-e7Ld9QMD7s0WniR4t8MXj99W1H4gvv4xgDHmZsYnAY6f4_p6x5fWmWq2T98fa-nK8TLKs6wbwCEEWuTGbQGNSoqdcdlLpXnRJYdChkBRqEKkgYmZmuQlWet7IHNJRP2ePf7RXQHoLdYzi1F0h7heS_RQlF3g |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY EPD GOX |
DOI | 10.48550/arxiv.1912.09508 |
DatabaseName | arXiv Computer Science arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1912_09508 |
GroupedDBID | AKY EPD GOX |
ID | FETCH-LOGICAL-a678-a37001439f2faffacacedcb06cd9b91a4ba1570c0194e1f52fb7a96a675d0afe3 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:49:25 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a678-a37001439f2faffacacedcb06cd9b91a4ba1570c0194e1f52fb7a96a675d0afe3 |
OpenAccessLink | https://arxiv.org/abs/1912.09508 |
ParticipantIDs | arxiv_primary_1912_09508 |
PublicationCentury | 2000 |
PublicationDate | 2019-12-19 |
PublicationDateYYYYMMDD | 2019-12-19 |
PublicationDate_xml | – month: 12 year: 2019 text: 2019-12-19 day: 19 |
PublicationDecade | 2010 |
PublicationYear | 2019 |
Score | 1.7581799 |
SecondaryResourceType | preprint |
Snippet | A common question being raised in automatic speech recognition (ASR)
evaluations is how reliable is an observed word error rate (WER) improvement
comparing two... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Statistics - Machine Learning |
Title | Statistical Testing on ASR Performance via Blockwise Bootstrap |
URI | https://arxiv.org/abs/1912.09508 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LSwMxEB7anryIolK1Sg5eV3ez2ddFaMVaBB9ohd6WyUsWYbe0tfrznWQr9eIlh0xyyATyzWRmvgG40MbVoCU8iHVSkIMS2SBXKQ1CY-bYWLjvPPfwmE7exP0smXWA_dbC4OK7Wrf8wHJ5Rc4Evwxdo9IudDl3KVt3T7M2OOmpuDbrt-vIxvRTf0BivAe7G-uODdvr2IeOqQ_g2hl0ng-ZRFPHalG_s6Zmw9cX9rzN22frCtmIsOXjq1oaNmqalfuGmB_CdHw7vZkEm7YFAdLLH2DsQrmE85ZbtBYVKqOVDFOlC1lEKCRGSRYqsq2EiWzCrcywSGlvokO0Jj6CHnn-pg8spidAa0JwSSCck28hbGwVFiLXoRVZfAx9f9hy3jJTlE4PpdfDyf-iU9gh1PdNEKJiAL3V4tOcEbKu5LlX7w_lOnii |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Statistical+Testing+on+ASR+Performance+via+Blockwise+Bootstrap&rft.au=Liu%2C+Zhe&rft.au=Peng%2C+Fuchun&rft.date=2019-12-19&rft_id=info:doi/10.48550%2Farxiv.1912.09508&rft.externalDocID=1912_09508 |