Chinese new word extraction from MicroBlog data
Chinese new word extraction is an important task in Chinese natural language processing and MicroBlog has become a main place of new words' creation and dissemination. Although many effective methods have been proposed, there is a lack of research on Internet texts especially MicroBlog texts. I...
Saved in:
Published in | Proceedings (International Conference on Machine Learning and Cybernetics.) Vol. 4; pp. 1874 - 1879 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2013
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Chinese new word extraction is an important task in Chinese natural language processing and MicroBlog has become a main place of new words' creation and dissemination. Although many effective methods have been proposed, there is a lack of research on Internet texts especially MicroBlog texts. In this paper, we study the MicroBlog-oriented method for new word extraction. Firstly we analyze the performance of classical statistical measures in extracting new words from MicroBlog texts. Secondly we base our work on Branch Entropy. For the shortcomings of statistical measures and the characteristics of MicroBlog texts, we propose a modified method. Experimental result demonstrates that our method is feasible and effective. Lastly, we show four types of new words extracted from MicroBlog. |
---|---|
ISSN: | 2160-133X |
DOI: | 10.1109/ICMLC.2013.6890901 |