Morphology generation for English-Indian language statistical machine translation

When translating into morphologically rich languages, statistical MT approaches face the problem of data sparsity. The severity of the sparseness problem will be high when the corpus size of morphologically richer language is less. Even though, we can use factored models to correctly generate morpho...

Full description

Saved in:

Bibliographic Details
Published in	Soft computing (Berlin, Germany) Vol. 25; no. 5; pp. 3657 - 3664
Main Author	Sreelekha, S.
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.03.2021
Subjects	Artificial Intelligence Computational Intelligence Control Engineering Mathematical Logic and Foundations Mechatronics Methodologies and Application Robotics Statistical machine translation Machine translation Morphology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	When translating into morphologically rich languages, statistical MT approaches face the problem of data sparsity. The severity of the sparseness problem will be high when the corpus size of morphologically richer language is less. Even though, we can use factored models to correctly generate morphological forms of words, the problem of data sparseness limits their performance. In this paper, we describe a simple and effective solution which is based on enriching the input corpora with various morphological forms of words. We use this method with the phrase-based and factor-based experiments on two morphologically rich languages: Hindi and Marathi when translating from English. We evaluate the performance of our experiments both in terms of automatic evaluation and subjective evaluation such as adequacy and fluency. We observe that the morphology injection method helps in improving the quality of translation. We further analyze that the morph injection method helps in handling the data sparseness problem to a great level.
ISSN:	1432-7643 1433-7479
DOI:	10.1007/s00500-020-05393-7