Multiple Deep Learning Models and Architectures with Different Numbers of States Used to Improve Retrieval Accuracy of Query-by-Example

Studies examining Spoken Term Detection (STD) and Spoken Query STD (SQ-STD) or Query by Example (QbE) using a spoken query have been conducted actively in recent years. When a spoken query is transcribed into a text using an automatic speech recognizer in SQ-STD, some misrecognition leads to retriev...

Full description

Saved in:
Bibliographic Details
Published in2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) pp. 1067 - 1071
Main Authors Hatakeyama, Kazuki, Nishino, Masahiro, Kojima, Kazunori, Lee, Shi-wook, Itoh, Yoshiaki
Format Conference Proceeding
LanguageEnglish
Published APSIPA 14.12.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Studies examining Spoken Term Detection (STD) and Spoken Query STD (SQ-STD) or Query by Example (QbE) using a spoken query have been conducted actively in recent years. When a spoken query is transcribed into a text using an automatic speech recognizer in SQ-STD, some misrecognition leads to retrieval accuracy deterioration. Posteriorgrams obtained using Deep Neural Network (DNN) and so on can be regarded as speaker-independent features. Although posteriorgram matching between a posteriorgram of a spoken query and posteriorgram of speech data showed high retrieval accuracy, it requires a long retrieval time and a large memory space. In earlier papers, we proposed a maximum likelihood state sequence method (MLSS) for retrieval time reduction. As described herein, we propose a method for reducing both the retrieval time and the memory space using MLSS method and multiple machine learning models with different numbers of states. The models show heterogeneous retrieval results. Their integration is probably mutually complementary and engenders retrieval accuracy improvement. Evaluation results demonstrate that the proposed method improves the retrieval accuracy, thereby reducing the retrieval time and the memory space.
ISSN:2640-0103