Web-Based Language Model Domain Adaptation for Real World Voice Retrieval

This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice...

Full description

Saved in:

Bibliographic Details
Published in	2013 Ninth International Conference on Computational Intelligence and Security pp. 100 - 104
Main Authors	Mengzhe Chen, Qingqing Zhang, Zhichao Wang, Jielin Pan, Yonghong Yan
Format	Conference Proceeding
Language	English
Published	IEEE 01.12.2013
Subjects	Adaptation models blockbased language model Data models domain-specific language model Entertainment industry Hidden Markov models Speech recognition Training Training data voice retrieval
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice retrieval systems, new created "hot words" are inputted as the keywords. In order to ensure the quality of the user experience, it is important to increase the recognition performance of these hot words. Second, for our applications, the retrieval domain is given. How to automatically select in domain data from the web data and update domain-specific language models is another problem which needs to be solved. To address these issues, in the system the latest text training data are obtained by searching web data related to the top ranking hot words. Based on the data, a block-based language modeling method is proposed to automatically build and update domain-specific language models. Meanwhile, in-domain high frequency words and phrases are added into the lexicon for the lexicon updating. From real world users' voice retrieval dataset, experimental results showed that through the updating of our system, consistent improvements were achieved for in-domain voice retrieval recognition.
DOI:	10.1109/CIS.2013.28