Unsupervised Discovery of Structured Acoustic Tokens With Applications to Spoken Term Detection

In this paper, we compare two paradigms for unsupervised discovery of structured acoustic tokens directly from speech corpora without any human annotation. The multigranular paradigm seeks to capture all available information in the corpora with multiple sets of tokens for different model granularit...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 26; no. 2; pp. 394 - 405
Main Authors	Chung, Cheng-Tao, Lee, Lin-Shan
Format	Journal Article
Language	English
Published	IEEE 01.02.2018
Subjects	Acoustics automatic speech recognition Hidden Markov models Manganese Pragmatics Speech Speech processing spoken term detection Training unsupervised term discovery Zero resource
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we compare two paradigms for unsupervised discovery of structured acoustic tokens directly from speech corpora without any human annotation. The multigranular paradigm seeks to capture all available information in the corpora with multiple sets of tokens for different model granularities. The hierarchical paradigm attempts to jointly learn several levels of signal representations in a hierarchical structure. The two paradigms are unified within a theoretical framework in this paper. Query-by-example spoken term detection (QbE-STD) experiments on the query by example search on speech task dataset of MediaEval 2015 verifies the competitiveness of the acoustic tokens. The enhanced relevance score proposed in this work improves both paradigms for the task of QbE-STD. We also list results on the ABX evaluation task of the Zero Resource Challenge 2015 for comparison of the paradigms.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2017.2778948