Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguis...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Trinh, Viet Anh, Southwell, Rosy, Guan, Yiwen, He, Xinlu, Wang, Zhiyong, Whitehill, Jacob
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 25.06.2024
Subjects
Online AccessGet full text

Cover

Loading…