Web-Based Language Model Domain Adaptation for Real World Voice Retrieval

This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice...

Full description

Saved in:
Bibliographic Details
Published in2013 Ninth International Conference on Computational Intelligence and Security pp. 100 - 104
Main Authors Mengzhe Chen, Qingqing Zhang, Zhichao Wang, Jielin Pan, Yonghong Yan
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice retrieval systems, new created "hot words" are inputted as the keywords. In order to ensure the quality of the user experience, it is important to increase the recognition performance of these hot words. Second, for our applications, the retrieval domain is given. How to automatically select in domain data from the web data and update domain-specific language models is another problem which needs to be solved. To address these issues, in the system the latest text training data are obtained by searching web data related to the top ranking hot words. Based on the data, a block-based language modeling method is proposed to automatically build and update domain-specific language models. Meanwhile, in-domain high frequency words and phrases are added into the lexicon for the lexicon updating. From real world users' voice retrieval dataset, experimental results showed that through the updating of our system, consistent improvements were achieved for in-domain voice retrieval recognition.
DOI:10.1109/CIS.2013.28