Mongolian Named Entity Recognition using suffixes segmentation
Mongolian is an agglutinative language with the complex morphological structures. Building an accurate Named Entity Recognition (NER) system for Mongolian is a challenging and meaningful work. This paper analyzes the characteristic of Mongolian suffixes using Narrow Non-Break Space and investigates...
Saved in:
Published in | 2015 International Conference on Asian Language Processing (IALP) pp. 169 - 172 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.10.2015
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Mongolian is an agglutinative language with the complex morphological structures. Building an accurate Named Entity Recognition (NER) system for Mongolian is a challenging and meaningful work. This paper analyzes the characteristic of Mongolian suffixes using Narrow Non-Break Space and investigates Mongolian NER system under three methods in the Condition Random Field framework. The experiment shows that segmenting each suffix into an individual token achieves the best performance than both without segmenting and using the suffixes as a feature. Our approach obtains an F-measure = 82.71. It is appropriate for the Mongolian large scale vocabulary NER. This research also makes sense to other agglutinative languages NER systems. |
---|---|
ISBN: | 1467395951 9781467395953 |
DOI: | 10.1109/IALP.2015.7451558 |