Dynamic Combination of Automatic Speech Recognition Systems by Driven Decoding

Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven d...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 21; no. 6; pp. 1251 - 1260
Main Authors	Lecouteux, B., Linares, G., Esteve, Y., Gravier, G.
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.06.2013 Institute of Electrical and Electronics Engineers
Subjects	Acoustics Adaptation models Applied sciences Automatic speech recognition Coding, codes Computer Science Decoding Exact sciences and technology Hidden Markov models Information, signal and communications theory Multimedia Pragmatics Signal and communications theory Signal processing Speech Speech processing Speech recognition system combination Telecommunications and information theory Performance evaluation Error rate Cable television system combination Decoding Search algorithm Automatic speech recognition French Audiovisual document News Speech recognition Integrated system Relative error Robustness Automatic recognition Speech processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Combining automatic speech recognition (ASR) systems generally relies on the posterior merging of the outputs or on acoustic cross-adaptation. In this paper, we propose an integrated approach where outputs of secondary systems are integrated in the search algorithm of a primary one. In this driven decoding algorithm (DDA), the secondary systems are viewed as observation sources that should be evaluated and combined to others by a primary search algorithm. DDA is evaluated on a subset of the ESTER I corpus consisting of 4 hours of French radio broadcast news. Results demonstrate DDA significantly outperforms vote-based approaches: we obtain an improvement of 14.5% relative word error rate over the best single-systems, as opposed to the the 6.7% with a ROVER combination. An in-depth analysis of the DDA shows its ability to improve robustness (gains are greater in adverse conditions) and a relatively low dependency on the search algorithm. The application of DDA to both and beam-search-based decoder yields similar performances.
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2013.2248716