Training data selection for improving discriminative training of acoustic models

This paper considers training data selection for discriminative training of acoustic models for large vocabulary continuous speech recognition (LVCSR). Three novel data selection approaches are proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 30; no. 13; pp. 1228 - 1235
Main Authors	Chen, Berlin, Liu, Shih-Hung, Chu, Fang-Hui
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.10.2009 Elsevier
Subjects	Acoustic models Applied sciences Continuous speech recognition Data selection Discriminative training Entropy Exact sciences and technology Information, signal and communications theory Phone accuracy Signal processing Speech processing Telecommunications and information theory Acoustic models Data selection Entropy Continuous speech recognition Discriminative training Phone accuracy Performance evaluation Discriminant analysis Automatic transcription Cable television Learning Accuracy Posterior probability Audiovisual document News Speech recognition Speech processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper considers training data selection for discriminative training of acoustic models for large vocabulary continuous speech recognition (LVCSR). Three novel data selection approaches are proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance is utilized for utterance-level data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice is investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice is explored. The underlying characteristics of the presented approaches are extensively investigated and their performance is verified by comparison with standard discriminative training approaches. Experiments conducted on a broadcast news speech transcription task show that with the aid of phone- and frame-level data selection we can reduce more than half of the turnaround time for acoustic model training and simultaneously obtain a comparably good set of discriminative acoustic models.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2009.05.009