Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Guo, Pengcheng, Boyer, Florian, Chang, Xuankai, Hayashi, Tomoki, Higuchi, Yosuke, Inaguma, Hirofumi, Kamo, Naoyuki, Li, Chenda, Garcia-Romero, Daniel, Shi, Jiatong, Shi, Jing, Watanabe, Shinji, Wei, Kun, Zhang, Wangyou, Zhang, Yuekai
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 29.10.2020
Subjects	Automatic speech recognition Convolution Speech Speech processing Toolkits Transformers Translations
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
ISSN:	2331-8422