Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
Recent work on discrete speech tokenization has paved the way for models that can seamlessly perform multiple tasks across modalities, e.g., speech recognition, text to speech, speech to speech translation. Moreover, large language models (LLMs) pretrained from vast text corpora contain rich linguis...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
25.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!