Mongolian Named Entity Recognition using suffixes segmentation

Mongolian is an agglutinative language with the complex morphological structures. Building an accurate Named Entity Recognition (NER) system for Mongolian is a challenging and meaningful work. This paper analyzes the characteristic of Mongolian suffixes using Narrow Non-Break Space and investigates...

Full description

Saved in:
Bibliographic Details
Published in2015 International Conference on Asian Language Processing (IALP) pp. 169 - 172
Main Authors Weihua Wang, Feilong Bao, Guanglai Gao
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Mongolian is an agglutinative language with the complex morphological structures. Building an accurate Named Entity Recognition (NER) system for Mongolian is a challenging and meaningful work. This paper analyzes the characteristic of Mongolian suffixes using Narrow Non-Break Space and investigates Mongolian NER system under three methods in the Condition Random Field framework. The experiment shows that segmenting each suffix into an individual token achieves the best performance than both without segmenting and using the suffixes as a feature. Our approach obtains an F-measure = 82.71. It is appropriate for the Mongolian large scale vocabulary NER. This research also makes sense to other agglutinative languages NER systems.
ISBN:1467395951
9781467395953
DOI:10.1109/IALP.2015.7451558