Hallucinated n-best lists for discriminative language modeling

This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong ba...

Full description

Saved in:

Bibliographic Details
Published in	2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5001 - 5004
Main Authors	Sagae, K., Lehr, M., Prud'hommeaux, E., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Saraclar, M., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.
Format	Conference Proceeding
Language	English
Published	IEEE 01.03.2012
Subjects	automatic speech recognition Data models discriminative training Hidden Markov models language modeling semi-supervised methods Speech Speech recognition Training Training data Transducers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with "real" n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.
ISBN:	1467300454 9781467300452
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2012.6289043