The first Mirandese text-to-speech system

This paper describes the creation of base NLP resources and tools for an under- resourced minority language spoken in Portugal, Mirandese, in the context of the generation of a text-to-speech system, a collaborative citizenship project between Microsoft, ILTEC, and ALM - Associacion de la Lhengua Mi...

Full description

Saved in:
Bibliographic Details
Published inLanguage documentation and conservation p. 150
Main Authors Ferreira, José Pedro, Chesi, Cristiano, Baldewijns, Daan, Braga, Daniela, Dias, Miguel, Correia, Margarita
Format Journal Article
LanguageEnglish
Published Honolulu University of Hawaii Press 01.01.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper describes the creation of base NLP resources and tools for an under- resourced minority language spoken in Portugal, Mirandese, in the context of the generation of a text-to-speech system, a collaborative citizenship project between Microsoft, ILTEC, and ALM - Associacion de la Lhengua Mirandesa. Development efforts encompassed the compilation of a large textual corpus, definition of a complete phone-set, development of a tokenizer, inflector, TN and GTP modules, and creation of a large phonetic lexicon with syllable segmentation, stress mark-up, and POS. The TTS system will provide an open access web interface freely available to the community, along with the other resources. We took advantage of mature tools, resources, and processes al- ready available for phylogenetically-close languages, allowing us to cut development time and resources to a great extent, a solution that can be viable for other lesser-spoken languages which enjoy a similar situation.
ISSN:1934-5275