An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion

Self-supervised learning (SSL) has become one of the most important technologies to realize spoken dialogue systems for languages that do not have much audio data and its transcription available. Speech representation models are one of the keys to achieving this, and have been actively studied in re...

Full description

Saved in:
Bibliographic Details
Published inICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 7107 - 7111
Main Authors Maekaku, Takashi, Chang, Xuankai, Fujita, Yuya, Watanabe, Shinji
Format Conference Proceeding
LanguageEnglish
Published IEEE 23.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Self-supervised learning (SSL) has become one of the most important technologies to realize spoken dialogue systems for languages that do not have much audio data and its transcription available. Speech representation models are one of the keys to achieving this, and have been actively studied in recent years. Among them, Hidden-Unit BERT (HuBERT) has shown promising results in automatic speech recognition (ASR) tasks. However, previous studies have investigated with limited iterations and cluster units. We explore HuBERT with larger numbers of clusters and iterations in order to obtain better speech representation. Furthermore, we introduce the Bayesian Information Criterion (BIC) as the performance measure of the model. Experimental results show that our model achieves the best performance in 5 out of 8 scores in the 4 metrics for the Zero Resource Speech 2021 task. It also outperforms the HuBERT BASE model trained with 960-hour LibriSpeech (LS) even though our model is only trained with 100-hour LS. In addition, we report that BIC is useful as a clue for determining the appropriate number of clusters to improve performance on phonetic, lexical, and syntactic metrics. Finally, we show that these findings are also effective for the ASR task.
ISSN:2379-190X
DOI:10.1109/ICASSP43922.2022.9746097