Detection of Arbitrary Wake Words by Coupling a Phoneme Predictor and a Phoneme Sequence Detector
Most wake word (WW) detection systems used in smartphones and smart speakers only detect specific, predefined WWs such as “Hey, Siri” or “OK, Google”. To build such a system, a large speech corpus consisting of many examples of the selected WWs must be collected to train the model. If we want the de...
Saved in:
Published in | APSIPA transactions on signal and information processing Vol. 13; no. 1 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Boston — Delft
Now Publishers
01.01.2024
Now Publishers Inc |
Subjects | |
Online Access | Get full text |
ISSN | 2048-7703 2048-7703 |
DOI | 10.1561/116.20240014 |
Cover
Summary: | Most wake word (WW) detection systems used in smartphones
and smart speakers only detect specific, predefined WWs such as
“Hey, Siri” or “OK, Google”. To build such a system, a large speech
corpus consisting of many examples of the selected WWs must be
collected to train the model. If we want the device to detect a
different WW, collection of a new speech corpus and re-training of
the model are required.
In this study, we propose a system which is capable of detecting
any chosen WW without additional model training or a corpus of
WW utterances, allowing users to select and use their preferred
WW. Our system consists of a phoneme predictor (PP) and a
phoneme sequence detector (PSD). The PP predicts phoneme
sequences using acoustic features of the input speech, and outputs
phoneme probability distributions. The acoustic models in the PP
are trained using the Connectionist Temporal Classification (CTC)
loss criterion. The PSD takes the output of the PP as input, and
predicts the probability of whether or not the WW has been input.
In our evaluation experiments, we performed six-phoneme WW
detection. Our results showed that the proposed method achieved
90% WW detection accuracy. |
---|---|
Bibliography: | CTC SIP-20240014 Wake word Now Publishers end-to-end modeling phoneme sequence detector ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 2048-7703 2048-7703 |
DOI: | 10.1561/116.20240014 |