Part-of-speech tagging based on dictionary and statistical machine learning

Part-of-speech tagging is the basis of Natural Language Processing, and is widely used in information retrieval, text processing and machine translation fields. The traditional statistical machine learning methods of POS tagging rely on the high quality training data, but obtaining the training data...

Full description

Saved in:

Bibliographic Details
Published in	2016 35th Chinese Control Conference (CCC) pp. 6993 - 6998
Main Authors	Ye, Zhonglin, Jia, Zhen, Huang, Junfu, Yin, Hongfeng
Format	Conference Proceeding Journal Article
Language	English
Published	TCCT 01.07.2016
Subjects	ambiguity word big data Conferences Decision support systems Dictionaries Machine learning Machine translation Marking maximum entropy Natural language processing part-of-speech tagging Texts Training word segmentation dictionary
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Part-of-speech tagging is the basis of Natural Language Processing, and is widely used in information retrieval, text processing and machine translation fields. The traditional statistical machine learning methods of POS tagging rely on the high quality training data, but obtaining the training data is very time-consuming. The methods of POS tagging based on dictionaries ignore the context information, which lead to lower performance. This paper proposed a POS tagging approach which combines methods based on dictionaries and traditional statistical machine learning. The experimental results show that the approach not only can solve the problem that the training data are insufficient in statistical methods, but also can improve the performance of the methods based on dictionaries. The People's Daily corpus in January 1998 is used as testing data, and the accurate rate of POS tagging achieves 95.80%. For the ambiguity word POS tagging, the accuracy achieves 88%.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
ISSN:	2161-2927 1934-1768
DOI:	10.1109/ChiCC.2016.7554459