Part-of-speech tagging based on dictionary and statistical machine learning
Part-of-speech tagging is the basis of Natural Language Processing, and is widely used in information retrieval, text processing and machine translation fields. The traditional statistical machine learning methods of POS tagging rely on the high quality training data, but obtaining the training data...
Saved in:
Published in | 2016 35th Chinese Control Conference (CCC) pp. 6993 - 6998 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding Journal Article |
Language | English |
Published |
TCCT
01.07.2016
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Part-of-speech tagging is the basis of Natural Language Processing, and is widely used in information retrieval, text processing and machine translation fields. The traditional statistical machine learning methods of POS tagging rely on the high quality training data, but obtaining the training data is very time-consuming. The methods of POS tagging based on dictionaries ignore the context information, which lead to lower performance. This paper proposed a POS tagging approach which combines methods based on dictionaries and traditional statistical machine learning. The experimental results show that the approach not only can solve the problem that the training data are insufficient in statistical methods, but also can improve the performance of the methods based on dictionaries. The People's Daily corpus in January 1998 is used as testing data, and the accurate rate of POS tagging achieves 95.80%. For the ambiguity word POS tagging, the accuracy achieves 88%. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2 |
ISSN: | 2161-2927 1934-1768 |
DOI: | 10.1109/ChiCC.2016.7554459 |