Fine-tuning convergence model in Bengali speech recognition

Research on speech recognition has attracted considerable interest due to the difficult task of segmenting uninterrupted speech. Among various languages, Bengali features distinct rhythmic patterns and tones, making it particularly difficult to recognize and lacking an efficient commercial recogniti...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Zhu Ruiying, Shen, Meng
Format Paper Journal Article
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 07.11.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Research on speech recognition has attracted considerable interest due to the difficult task of segmenting uninterrupted speech. Among various languages, Bengali features distinct rhythmic patterns and tones, making it particularly difficult to recognize and lacking an efficient commercial recognition method. In order to improve the automatic speech recognition model for Bengali, our team has chosen to utilize the wave2vec 2.0 pre-trained model, which has undergone convergence for fine-tuning. Regarding Word Error Rate (WER), the learning rate and dropout parameters were fine-tuned, and after the model training was stable, attempts were made to enlarge the training set ratio, which improved the model's performance. Consequently, there was a notable enhancement in the WER from 0.508 to 0.437 on the test set of the publicly listed official dataset. Afterwards, the training and validation sets were merged, creating a comprehensive dataset that was used as the training set, achieving a remarkable WER of 0.436.
ISSN:2331-8422
DOI:10.48550/arxiv.2311.04122