A CLUSTERING TECHNIQUE FOR THE VIETNAMESE WORD CATEGORIZATION

In natural language processing, part-of-speech (POS) tagging plays an important role, as its output is the input of many other tasks (syntax analysis, semantic analysis. . . ). One of the problems related to POS tagging is to define the POS set. This could be solved using unsupervised machine learni...

Full description

Saved in:

Bibliographic Details
Published in	Tạp chí Khoa học Đại học Đà Lạt Vol. 6; no. 2
Main Authors	Nguyễn Minh Hiệp, Nguyễn Thị Minh Huyền, Ngô Thế Quyền, Trần Thị Phương Linh
Format	Journal Article
Language	English
Published	Dalat University 01.06.2016
Subjects	corpus dbscan gán nhãn từ loại phân cụm tập từ loạị từ loại
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In natural language processing, part-of-speech (POS) tagging plays an important role, as its output is the input of many other tasks (syntax analysis, semantic analysis. . . ). One of the problems related to POS tagging is to define the POS set. This could be solved using unsupervised machine learning methods. This paper presents an application of the DBSCAN clustering algorithm to classify Vietnamese words from a large corpus. The features used to characterize each word are naturally defined by the context of that word in a sentence. We use a large corpus containing sentences automatically extracted from the online Nhan Dan newspaper.
ISSN:	0866-787X 0866-787X
DOI:	10.37569/DalatUniversity.6.2.40(2016)