Identifying top Chinese network buzzwords from social media big data set based on time-distribution features

Buzzwords are the main embodiment of Internet culture, which play an important role in public opinion analysis, social focus tracking and language evolution study. At present, questionnaire has been wildly used as a standard method to obtain network buzzwords, which is subjective and costly. In this...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE International Conference on Big Data (Big Data) pp. 924 - 931
Main Authors Yongli Tang, Tingting He, Bo Li, Xiaohua Hu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Buzzwords are the main embodiment of Internet culture, which play an important role in public opinion analysis, social focus tracking and language evolution study. At present, questionnaire has been wildly used as a standard method to obtain network buzzwords, which is subjective and costly. In this paper, we will propose a novel algorithm relying on the time-distribution feature of words and a KL-divergence measure to estimate words' popularity so as to figure out buzzwords in a specific period. The time-distribution feature simply states the fact that buzzwords' usage has a sharp increase during a very short period, which is then modeled formally with the KL-divergence measure. Compared with traditional method involving much workforce, the automatic algorithm presented here is clearly more efficient. Moreover, buzzwords identified in this manner will not be affected by individual's subjective opinions, so they can reflect the language usage in practice better. When applying the algorithm to a social media big data set, our experimental results show that the proposed approach can accurately identify buzzwords in a certain period, which is highly coincident with results tagged manually.
DOI:10.1109/BigData.2014.7004324