Domain Specific Wav2vec 2.0 Fine-tuning For The SE&R 2022 Challenge

This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022). The goal of the challenge is to advance the ASR research for the Portuguese language, consi...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Alef Iury Siqueira Ferreira, Gustavo dos Reis Oliveira
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 29.07.2022
Subjects	Automatic speech recognition Emotion recognition
Online Access	Get full text
ISSN	2331-8422

Cover

Loading…

More Information
Summary:	This paper presents our efforts to build a robust ASR model for the shared task Automatic Speech Recognition for spontaneous and prepared speech & Speech Emotion Recognition in Portuguese (SE&R 2022). The goal of the challenge is to advance the ASR research for the Portuguese language, considering prepared and spontaneous speech in different dialects. Our method consist on fine-tuning an ASR model in a domain-specific approach, applying gain normalization and selective noise insertion. The proposed method improved over the strong baseline provided on the test set in 3 of the 4 tracks available
Bibliography:	content type line 50 SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1
ISSN:	2331-8422