Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection

Recently the posteriorgram-based template matching framework has been successfully applied to query-by-example spoken term detection tasks for low-resource languages. This framework employs a tokenizer to derive posteriorgrams, and applies dynamic time warping (DTW) to the posteriorgrams to locate t...

Full description

Saved in:
Bibliographic Details
Published in2013 IEEE International Conference on Acoustics, Speech and Signal Processing pp. 8545 - 8549
Main Authors Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recently the posteriorgram-based template matching framework has been successfully applied to query-by-example spoken term detection tasks for low-resource languages. This framework employs a tokenizer to derive posteriorgrams, and applies dynamic time warping (DTW) to the posteriorgrams to locate the possible occurrences of a query term. Based on this framework, we propose to improve the detection performance by using multiple tokenizers with DTW distance matrix combination. The proposed approach uses multiple tokenizers in parallel as the front-end to generate different posteriorgram representations, and combines the distance matrices of the different posteriorgrams into a single matrix. DTW detection is then applied to the combined distance matrix. Lastly score post-processing techniques including pseudo-relevance feedback and score normalization are used for further improvement. Experiments were conducted on the spoken web search datasets of MediaEval 2011 and MediaEval 2012. Experimental results show that combining multiple tokenizers significantly outperforms the best single tokenizer, and that the DTW matrix combination method consistently outperforms the score combination method when more than three tokenizers are involved. Score post-processing techniques show further gains on top of using multiple tokenizers.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2013.6639333