Part-of-speech tagging based on dictionary and statistical machine learning

Part-of-speech tagging is the basis of Natural Language Processing, and is widely used in information retrieval, text processing and machine translation fields. The traditional statistical machine learning methods of POS tagging rely on the high quality training data, but obtaining the training data...

Full description

Saved in:
Bibliographic Details
Published in2016 35th Chinese Control Conference (CCC) pp. 6993 - 6998
Main Authors Ye, Zhonglin, Jia, Zhen, Huang, Junfu, Yin, Hongfeng
Format Conference Proceeding Journal Article
LanguageEnglish
Published TCCT 01.07.2016
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Part-of-speech tagging is the basis of Natural Language Processing, and is widely used in information retrieval, text processing and machine translation fields. The traditional statistical machine learning methods of POS tagging rely on the high quality training data, but obtaining the training data is very time-consuming. The methods of POS tagging based on dictionaries ignore the context information, which lead to lower performance. This paper proposed a POS tagging approach which combines methods based on dictionaries and traditional statistical machine learning. The experimental results show that the approach not only can solve the problem that the training data are insufficient in statistical methods, but also can improve the performance of the methods based on dictionaries. The People's Daily corpus in January 1998 is used as testing data, and the accurate rate of POS tagging achieves 95.80%. For the ambiguity word POS tagging, the accuracy achieves 88%.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
ISSN:2161-2927
1934-1768
DOI:10.1109/ChiCC.2016.7554459