Using parallel tokenizers with DTW matrix combination for low-resource spoken term detection

Recently the posteriorgram-based template matching framework has been successfully applied to query-by-example spoken term detection tasks for low-resource languages. This framework employs a tokenizer to derive posteriorgrams, and applies dynamic time warping (DTW) to the posteriorgrams to locate t...

Full description

Saved in:

Bibliographic Details
Published in	2013 IEEE International Conference on Acoustics, Speech and Signal Processing pp. 8545 - 8549
Main Authors	Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2013
Subjects	Acoustics DTW matrix combination Educational institutions Matrix converters pseudo-relevance feedback query-by-example spoken term detection Robustness Speech tandem tokenizer Training Vectors
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently the posteriorgram-based template matching framework has been successfully applied to query-by-example spoken term detection tasks for low-resource languages. This framework employs a tokenizer to derive posteriorgrams, and applies dynamic time warping (DTW) to the posteriorgrams to locate the possible occurrences of a query term. Based on this framework, we propose to improve the detection performance by using multiple tokenizers with DTW distance matrix combination. The proposed approach uses multiple tokenizers in parallel as the front-end to generate different posteriorgram representations, and combines the distance matrices of the different posteriorgrams into a single matrix. DTW detection is then applied to the combined distance matrix. Lastly score post-processing techniques including pseudo-relevance feedback and score normalization are used for further improvement. Experiments were conducted on the spoken web search datasets of MediaEval 2011 and MediaEval 2012. Experimental results show that combining multiple tokenizers significantly outperforms the best single tokenizer, and that the DTW matrix combination method consistently outperforms the score combination method when more than three tokenizers are involved. Score post-processing techniques show further gains on top of using multiple tokenizers.
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2013.6639333