Multiple Deep Learning Models and Architectures with Different Numbers of States Used to Improve Retrieval Accuracy of Query-by-Example
Studies examining Spoken Term Detection (STD) and Spoken Query STD (SQ-STD) or Query by Example (QbE) using a spoken query have been conducted actively in recent years. When a spoken query is transcribed into a text using an automatic speech recognizer in SQ-STD, some misrecognition leads to retriev...
Saved in:
Published in | 2021 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) pp. 1067 - 1071 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
APSIPA
14.12.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Studies examining Spoken Term Detection (STD) and Spoken Query STD (SQ-STD) or Query by Example (QbE) using a spoken query have been conducted actively in recent years. When a spoken query is transcribed into a text using an automatic speech recognizer in SQ-STD, some misrecognition leads to retrieval accuracy deterioration. Posteriorgrams obtained using Deep Neural Network (DNN) and so on can be regarded as speaker-independent features. Although posteriorgram matching between a posteriorgram of a spoken query and posteriorgram of speech data showed high retrieval accuracy, it requires a long retrieval time and a large memory space. In earlier papers, we proposed a maximum likelihood state sequence method (MLSS) for retrieval time reduction. As described herein, we propose a method for reducing both the retrieval time and the memory space using MLSS method and multiple machine learning models with different numbers of states. The models show heterogeneous retrieval results. Their integration is probably mutually complementary and engenders retrieval accuracy improvement. Evaluation results demonstrate that the proposed method improves the retrieval accuracy, thereby reducing the retrieval time and the memory space. |
---|---|
ISSN: | 2640-0103 |