Query-by-example keyword spotting using long short-term memory networks

We present a novel approach to query-by-example keyword spotting (KWS) using a long short-term memory (LSTM) recurrent neural network-based feature extractor. In our approach, we represent each keyword using a fixed-length feature vector obtained by running the keyword audio through a word-based LST...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5236 - 5240
Main Authors Guoguo Chen, Parada, Carolina, Sainath, Tara N.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We present a novel approach to query-by-example keyword spotting (KWS) using a long short-term memory (LSTM) recurrent neural network-based feature extractor. In our approach, we represent each keyword using a fixed-length feature vector obtained by running the keyword audio through a word-based LSTM acoustic model. We use the activations prior to the softmax layer of the LSTM as our keyword-vector. At runtime, we detect the keyword by extracting the same feature vector from a sliding window and computing a simple similarity score between this test vector and the keyword vector. With clean speech, we achieve 86% relative false rejection rate reduction at 0.5% false alarm rate when compared to a competitive phoneme posteriorgram with dynamic time warping KWS system, while the reduction in the presence of babble noise is 67%. Our system has a small memory footprint, low computational cost, and high precision, making it suitable for on-device applications.
ISSN:1520-6149
DOI:10.1109/ICASSP.2015.7178970