Recent Developments on ESPnet Toolkit Boosted by Conformer

In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Guo, Pengcheng, Boyer, Florian, Chang, Xuankai, Hayashi, Tomoki, Higuchi, Yosuke, Inaguma, Hirofumi, Kamo, Naoyuki, Li, Chenda, Garcia-Romero, Daniel, Shi, Jiatong, Shi, Jing, Watanabe, Shinji, Wei, Kun, Zhang, Wangyou, Zhang, Yuekai
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 29.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this study, we present recent developments on ESPnet: End-to-End Speech Processing toolkit, which mainly involves a recently proposed architecture called Conformer, Convolution-augmented Transformer. This paper shows the results for a wide range of end-to-end speech processing applications, such as automatic speech recognition (ASR), speech translations (ST), speech separation (SS) and text-to-speech (TTS). Our experiments reveal various training tips and significant performance benefits obtained with the Conformer on different tasks. These results are competitive or even outperform the current state-of-art Transformer models. We are preparing to release all-in-one recipes using open source and publicly available corpora for all the above tasks with pre-trained models. Our aim for this work is to contribute to our research community by reducing the burden of preparing state-of-the-art research environments usually requiring high resources.
ISSN:2331-8422