Code-Switching without Switching: Language Agnostic End-to-End Speech Translation
We propose a) a Language Agnostic end-to-end Speech Translation model (LAST), and b) a data augmentation strategy to increase code-switching (CS) performance. With increasing globalization, multiple languages are increasingly used interchangeably during fluent speech. Such CS complicates traditional...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
04.10.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We propose a) a Language Agnostic end-to-end Speech Translation model (LAST),
and b) a data augmentation strategy to increase code-switching (CS)
performance. With increasing globalization, multiple languages are increasingly
used interchangeably during fluent speech. Such CS complicates traditional
speech recognition and translation, as we must recognize which language was
spoken first and then apply a language-dependent recognizer and subsequent
translation component to generate the desired target language output. Such a
pipeline introduces latency and errors. In this paper, we eliminate the need
for that, by treating speech recognition and translation as one unified
end-to-end speech translation problem. By training LAST with both input
languages, we decode speech into one target language, regardless of the input
language. LAST delivers comparable recognition and speech translation accuracy
in monolingual usage, while reducing latency and error rate considerably when
CS is observed. |
---|---|
DOI: | 10.48550/arxiv.2210.01512 |