Semi-supervised and unsupervised discriminative language model training for automatic speech recognition

•We investigate supervised, semi-supervised and unsupervised training of DLMs.•We use supervised and unsupervised confusion models to generate artificial data.•We propose three target output selection methods for unsupervised DLM training.•Ranking perceptron performs better than structured perceptro...

Full description

Saved in:
Bibliographic Details
Published inSpeech communication Vol. 83; pp. 54 - 63
Main Authors Dikici, Erinç, Saraçlar, Murat
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.10.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•We investigate supervised, semi-supervised and unsupervised training of DLMs.•We use supervised and unsupervised confusion models to generate artificial data.•We propose three target output selection methods for unsupervised DLM training.•Ranking perceptron performs better than structured perceptron in most cases.•Significant gains in ASR accuracy are obtained with unmatched acoustic and text data. Discriminative language modeling aims to reduce the error rates by rescoring the output of an automatic speech recognition (ASR) system. Discriminative language model (DLM) training conventionally follows a supervised approach, using acoustic recordings together with their manual transcriptions (reference) as training data, and the recognition performance is improved with increasing amount of such matched data. In this study we investigate the case where matched data for DLM training is limited or is not available at all, and explore methods to improve ASR accuracy by incorporating acoustic and text data that come from separate sources. For semi-supervised training, we utilize a confusion model to generate artificial hypotheses instead of the real ASR N-bests. For unsupervised training, we propose three target output selection methods to take over the missing reference. We handle this task both as a structured prediction and a reranking problem and employ two different variants of the WER-sensitive perceptron algorithm. We show that significant improvement over baseline ASR accuracy is obtained even when there is no transcribed acoustic data available to train the DLM.
ISSN:0167-6393
1872-7182
DOI:10.1016/j.specom.2016.07.004