Query-by-example keyword spotting using long short-term memory networks

We present a novel approach to query-by-example keyword spotting (KWS) using a long short-term memory (LSTM) recurrent neural network-based feature extractor. In our approach, we represent each keyword using a fixed-length feature vector obtained by running the keyword audio through a word-based LST...

Full description

Saved in:

Bibliographic Details
Published in	2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5236 - 5240
Main Authors	Guoguo Chen, Parada, Carolina, Sainath, Tara N.
Format	Conference Proceeding
Language	English
Published	IEEE 01.04.2015
Subjects	Acoustics Computational modeling Feature extraction Hidden Markov models Noise Speech Speech processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present a novel approach to query-by-example keyword spotting (KWS) using a long short-term memory (LSTM) recurrent neural network-based feature extractor. In our approach, we represent each keyword using a fixed-length feature vector obtained by running the keyword audio through a word-based LSTM acoustic model. We use the activations prior to the softmax layer of the LSTM as our keyword-vector. At runtime, we detect the keyword by extracting the same feature vector from a sliding window and computing a simple similarity score between this test vector and the keyword vector. With clean speech, we achieve 86% relative false rejection rate reduction at 0.5% false alarm rate when compared to a competitive phoneme posteriorgram with dynamic time warping KWS system, while the reduction in the presence of babble noise is 67%. Our system has a small memory footprint, low computational cost, and high precision, making it suitable for on-device applications.
ISSN:	1520-6149
DOI:	10.1109/ICASSP.2015.7178970