Efficient spoken term detection using confusion networks

In this paper, we present a fast, vocabulary independent algorithm for spoken term detection (STD) that demonstrates a word-based index is sufficient to achieve good performance for both in-vocabulary (IV) and out-of-vocabulary (OOV) terms. Previous approaches have required that a separate index be...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 7844 - 7848
Main Authors Mangu, Lidia, Kingsbury, Brian, Soltau, Hagen, Hong-Kwang Kuo, Picheny, Michael
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we present a fast, vocabulary independent algorithm for spoken term detection (STD) that demonstrates a word-based index is sufficient to achieve good performance for both in-vocabulary (IV) and out-of-vocabulary (OOV) terms. Previous approaches have required that a separate index be built at the sub-word level and then expanded to allow for matching OOV terms. Such a process, while accurate, is expensive in both time and memory. In the proposed architecture, a word-level confusion network (CN) based index is used for both IV and OOV search. This is implemented using a flexible WFST framework. Comparisons on 3 Babel languages (Tagalog, Pashto and Turkish) show that CN-based indexing results in better performance compared with the lattice approach while being orders of magnitude faster and having a much smaller footprint.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2014.6855127