Two-stage spoken term detection system for under-resourced languages

Spoken Term Detection (STD) is the process of locating the occurrences of spoken queries in a given speech database. Generally, two methods are adopted for STD: an ASR based sequence matching and ASR-free, feature-based template matching. If a well-performing ASR is available, the former STD method...

Full description

Saved in:

Bibliographic Details
Published in	IET signal processing Vol. 14; no. 9; pp. 602 - 613
Main Authors	G, Deekshitha, Mary, Leena
Format	Journal Article
Language	English
Published	The Institution of Engineering and Technology 01.12.2020
Subjects	ASR‐free automatic speech recognition based sequence matching available annotated corpora erroneous label sequences feature extraction feature level template matching feature sequence template matching given speech database image matching labelled corpora labelled story database longer query words natural language processing phoneme label sequence matching probability probable query locations query length query processing Research Article search database sequence matching technique speech processing speech recognition spoken queries stage spoken term detection system STD method STD system STD task template matching approach template matching methods work template matching techniques time 3.5 hour time complexity two‐stage STD probable query locations sequence matching technique two-stage STD labelled story database spoken queries given speech database STD task erroneous label sequences query processing ASR-free labelled corpora speech recognition feature extraction automatic speech recognition based sequence matching query length speech processing STD system feature level template matching available annotated corpora template matching methods work stage spoken term detection system search database natural language processing probability time complexity image matching phoneme label sequence matching template matching approach feature sequence template matching longer query words template matching techniques STD method time 3.5 hour
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Spoken Term Detection (STD) is the process of locating the occurrences of spoken queries in a given speech database. Generally, two methods are adopted for STD: an ASR based sequence matching and ASR-free, feature-based template matching. If a well-performing ASR is available, the former STD method is accurate. However, to build an ASR with consistent performance, several hours of labelled corpora is required. Template matching methods work well for small or chopped utterances. However, in practice, the volume of the search database can be huge, containing sentences of varying lengths. Hence time complexity of template matching techniques will be high, which makes them impractical for realistic search applications. In this work, a two-stage STD system is proposed, which combines the ASR-based phoneme sequence matching in the first stage and feature sequence template matching of selected locations in the second stage. The time complexity of the second stage is reduced by performing DTW-based template matching only at probable query locations identified by the first stage. ‘Split and match’ approach helps to reduce the false-positives in case of longer query words. Effectiveness of the proposed method is demonstrated using English and Malayalam datasets.
ISSN:	1751-9675 1751-9683 1751-9683
DOI:	10.1049/iet-spr.2019.0131