Warning: Full texts from electronic resources are only available from the university network. You are currently outside this network. Please log in to access full texts.
Improve topic modeling algorithms based on Twitter hashtags
Today with increase using social media, a lot of researchers have interested in topic extraction from Twitter. Twitter is an unstructured short text and messy that it is critical to find topics from tweets. While topic modeling algorithms such as Latent semantic analysis (LSA) and Latent Dirichlet A...
Saved in:
Published in | Journal of physics. Conference series Vol. 1660; no. 1; pp. 12100 - 12108 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Bristol
IOP Publishing
01.11.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 1742-6588 1742-6596 |
DOI | 10.1088/1742-6596/1660/1/012100 |
Cover
Summary: | Today with increase using social media, a lot of researchers have interested in topic extraction from Twitter. Twitter is an unstructured short text and messy that it is critical to find topics from tweets. While topic modeling algorithms such as Latent semantic analysis (LSA) and Latent Dirichlet Allocation (LDA) are originally designed to derive topics from large documents such as articles, and books. They are often less efficient when applied to short text content like Twitter. Luckily, Twitter has many features that represent the interaction between users. Tweets have rich user-generated hashtags as keywords. In this paper, we exploit the hashtags feature to improve topics learned from Twitter content without modifying the basic topic model of LSA and LDA. Users who share the same hashtag at most discuss the same topic. We compare the performance of the two methods (LSA and LDA) using the topic coherence ( with and without hashtags). The experiment result on the Twitter dataset showed that LSA has better coherence score with hashtags than that do not incorporate hashtags. In contrast, our experiments show that the LDA has a better coherence score without incorporating hashtags. Finally, LDA has a better coherence score than LSA and the best coherence result obtained from the LDA method was (0.6047) and the LSA method was (0.4744) but the number of topics in LDA was higher than LSA. Thus, LDA may cause the same tweets to discuss the same subject set into different clustering. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1742-6588 1742-6596 |
DOI: | 10.1088/1742-6596/1660/1/012100 |