Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing

Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguis...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Trinh, Viet Anh, Southwell, Rosy, Guan, Yiwen, He, Xinlu, Wang, Zhiyong, Whitehill, Jacob
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 25.06.2024
Subjects	Large language models Speech processing Speech recognition
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!