Web-Based Language Model Domain Adaptation for Real World Voice Retrieval
This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice...
Saved in:
Published in | 2013 Ninth International Conference on Computational Intelligence and Security pp. 100 - 104 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2013
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper presents our recent work on the development of a real world voice retrieval system, which automatically updates language models for a specific domain with the latest web data. Two of the main difficult issues in handling this system are tackled in this paper. First, when people use voice retrieval systems, new created "hot words" are inputted as the keywords. In order to ensure the quality of the user experience, it is important to increase the recognition performance of these hot words. Second, for our applications, the retrieval domain is given. How to automatically select in domain data from the web data and update domain-specific language models is another problem which needs to be solved. To address these issues, in the system the latest text training data are obtained by searching web data related to the top ranking hot words. Based on the data, a block-based language modeling method is proposed to automatically build and update domain-specific language models. Meanwhile, in-domain high frequency words and phrases are added into the lexicon for the lexicon updating. From real world users' voice retrieval dataset, experimental results showed that through the updating of our system, consistent improvements were achieved for in-domain voice retrieval recognition. |
---|---|
DOI: | 10.1109/CIS.2013.28 |