Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

•We investigate supervised, semi-supervised and unsupervised training of DLMs.•We use supervised and unsupervised confusion models to generate artificial data.•We propose three target output selection methods for unsupervised DLM training.•Ranking perceptron performs better than structured perceptro...

Full description

Saved in:

Bibliographic Details
Published in	Speech communication Vol. 83; pp. 54 - 63
Main Authors	Dikici, Erinç, Saraçlar, Murat
Format	Journal Article
Language	English
Published	Elsevier B.V 01.10.2016
Subjects	Discriminative language modeling Semi-supervised training Unsupervised training Discriminative language modeling Semi-supervised training Unsupervised training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•We investigate supervised, semi-supervised and unsupervised training of DLMs.•We use supervised and unsupervised confusion models to generate artificial data.•We propose three target output selection methods for unsupervised DLM training.•Ranking perceptron performs better than structured perceptron in most cases.•Significant gains in ASR accuracy are obtained with unmatched acoustic and text data. Discriminative language modeling aims to reduce the error rates by rescoring the output of an automatic speech recognition (ASR) system. Discriminative language model (DLM) training conventionally follows a supervised approach, using acoustic recordings together with their manual transcriptions (reference) as training data, and the recognition performance is improved with increasing amount of such matched data. In this study we investigate the case where matched data for DLM training is limited or is not available at all, and explore methods to improve ASR accuracy by incorporating acoustic and text data that come from separate sources. For semi-supervised training, we utilize a confusion model to generate artificial hypotheses instead of the real ASR N-bests. For unsupervised training, we propose three target output selection methods to take over the missing reference. We handle this task both as a structured prediction and a reranking problem and employ two different variants of the WER-sensitive perceptron algorithm. We show that significant improvement over baseline ASR accuracy is obtained even when there is no transcribed acoustic data available to train the DLM.
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2016.07.004