Chinese new word extraction from MicroBlog data

Chinese new word extraction is an important task in Chinese natural language processing and MicroBlog has become a main place of new words' creation and dissemination. Although many effective methods have been proposed, there is a lack of research on Internet texts especially MicroBlog texts. I...

Full description

Saved in:
Bibliographic Details
Published inProceedings (International Conference on Machine Learning and Cybernetics.) Vol. 4; pp. 1874 - 1879
Main Authors Qi-Long Su, Bing-Quan Liu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Chinese new word extraction is an important task in Chinese natural language processing and MicroBlog has become a main place of new words' creation and dissemination. Although many effective methods have been proposed, there is a lack of research on Internet texts especially MicroBlog texts. In this paper, we study the MicroBlog-oriented method for new word extraction. Firstly we analyze the performance of classical statistical measures in extracting new words from MicroBlog texts. Secondly we base our work on Branch Entropy. For the shortcomings of statistical measures and the characteristics of MicroBlog texts, we propose a modified method. Experimental result demonstrates that our method is feasible and effective. Lastly, we show four types of new words extracted from MicroBlog.
ISSN:2160-133X
DOI:10.1109/ICMLC.2013.6890901