Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding
Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven d...
Saved in:
Published in | IEEE transactions on audio, speech, and language processing Vol. 21; no. 6; pp. 1251 - 1260 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway, NJ
IEEE
01.06.2013
Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both and beam-search-based decoder yields similar performances. |
---|---|
ISSN: | 1558-7916 1558-7924 |
DOI: | 10.1109/TASL.2013.2248716 |