A First LVCSR System for Luxembourgish, a Low-Resourced European Language
Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s low-resourced languages. We describe our efforts in building a large vocabulary ASR system for such a “minority” language without resorting to any prior transcribed aud...
Saved in:
Published in | Human Language Technology Challenges for Computer Science and Linguistics Vol. 8387; pp. 479 - 490 |
---|---|
Main Authors | , , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
2014
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Luxembourgish is embedded in a multilingual context on the divide between Romance and Germanic cultures and remains one of Europe’s low-resourced languages. We describe our efforts in building a large vocabulary ASR system for such a “minority” language without resorting to any prior transcribed audio training data. Instead, acoustic models are derived from major European languages. Furthermore, most Luxembourgish written sources include significant parts in other languages. This poses specific challenges to Language Model estimation. Some scientific and technological issues addressed include: (i) how to build acoustic models if no labeled acoustic training data are available for the under-resourced target language? (ii) how to make use of the new system to accelerate resource production for the target language? (iii) how to build a vocabulary and a language model with multilingual written texts? (iv) how to determine the “best” phonemic inventory for ASR? First ASR results illustrate the accuracy of the various sets of monolingual and multilingual acoustic models and what these suggest concerning language typology issues. |
---|---|
ISBN: | 9783319089577 3319089579 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-319-08958-4_39 |