Language Model Adaptation Based on Topic Probability of Latent Dirichlet Allocation

Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for...

Full description

Saved in:

Bibliographic Details
Published in	ETRI journal Vol. 38; no. 3; pp. 487 - 493
Main Authors	Jeon, Hyung-Bae, Lee, Soo-Young
Format	Journal Article
Language	Korean
Published	한국전자통신연구원 30.06.2016 ETRI
Subjects	LDA Language model adaptation Latent Dirichlet allocation weighted mixture model topic model
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Two new methods are proposed for an unsupervised adaptation of a language model (LM) with a single sentence for automatic transcription tasks. At the training phase, training documents are clustered by a method known as Latent Dirichlet allocation (LDA), and then a domain-specific LM is trained for each cluster. At the test phase, an adapted LM is presented as a linear mixture of the now trained domain-specific LMs. Unlike previous adaptation methods, the proposed methods fully utilize a trained LDA model for the estimation of weight values, which are then to be assigned to the now trained domain-specific LMs; therefore, the clustering and weight-estimation algorithms of the trained LDA model are reliable. For the continuous speech recognition benchmark tests, the proposed methods outperform other unsupervised LM adaptation methods based on latent semantic analysis, non-negative matrix factorization, and LDA with n-gram counting.
Bibliography:	KISTI1.1003/JNL.JAKO201658139713123
ISSN:	1225-6463 2233-7326