Deep Spoken Keyword Spotting: An Overview

Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with di...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 10; pp. 4169 - 4199
Main Authors	Lopez-Espejo, Ivan, Tan, Zheng-Hua, Hansen, John H. L., Jensen, Jesper
Format	Journal Article
Language	English
Published	Piscataway IEEE 2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	acoustic model Acoustics Automatic speech recognition Computational modeling Decoding Deep learning Electronic devices Feature extraction Hidden Markov models Keyword spotting Keywords Literature reviews Machine learning Performance evaluation robustness small footprint Speech recognition Virtual assistants Viterbi algorithm Voice recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams and has become a fast-growing technology thanks to the paradigm shift introduced by deep learning a few years ago. This has allowed the rapid embedding of deep KWS in a myriad of small electronic devices with different purposes like the activation of voice assistants. Prospects suggest a sustained growth in terms of social use of this technology. Thus, it is not surprising that deep KWS has become a hot research topic among speech scientists, who constantly look for KWS performance improvement and computational complexity reduction. This context motivates this paper, in which we conduct a literature review into deep spoken KWS to assist practitioners and researchers who are interested in this technology. Specifically, this overview has a comprehensive nature by covering a thorough analysis of deep KWS systems (which includes speech features, acoustic modeling and posterior handling), robustness methods, applications, datasets, evaluation metrics, performance of deep KWS systems and audio-visual KWS. The analysis performed in this paper allows us to identify a number of directions for future research, including directions adopted from automatic speech recognition research and directions that are unique to the problem of spoken KWS.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2021.3139508