Warning: Full texts from electronic resources are only available from the university network. You are currently outside this network. Please log in to access full texts.

Improve topic modeling algorithms based on Twitter hashtags

Today with increase using social media, a lot of researchers have interested in topic extraction from Twitter. Twitter is an unstructured short text and messy that it is critical to find topics from tweets. While topic modeling algorithms such as Latent semantic analysis (LSA) and Latent Dirichlet A...

Full description

Saved in:

Bibliographic Details
Published in	Journal of physics. Conference series Vol. 1660; no. 1; pp. 12100 - 12108
Main Authors	Alash, Hayder M, Al-Sultany, Ghaidaa A
Format	Journal Article
Language	English
Published	Bristol IOP Publishing 01.11.2020
Subjects	Algorithms Clustering Coherence Data mining Dirichlet problem Hashtag Latent Dirichlet Allocation (LDA) Latent semantic analysis (LSA) Modelling Physics Social networks Topic Derivation Twitter Unstructured data
Online Access	Get full text
ISSN	1742-6588 1742-6596
DOI	10.1088/1742-6596/1660/1/012100

Cover

More Information
Summary:	Today with increase using social media, a lot of researchers have interested in topic extraction from Twitter. Twitter is an unstructured short text and messy that it is critical to find topics from tweets. While topic modeling algorithms such as Latent semantic analysis (LSA) and Latent Dirichlet Allocation (LDA) are originally designed to derive topics from large documents such as articles, and books. They are often less efficient when applied to short text content like Twitter. Luckily, Twitter has many features that represent the interaction between users. Tweets have rich user-generated hashtags as keywords. In this paper, we exploit the hashtags feature to improve topics learned from Twitter content without modifying the basic topic model of LSA and LDA. Users who share the same hashtag at most discuss the same topic. We compare the performance of the two methods (LSA and LDA) using the topic coherence ( with and without hashtags). The experiment result on the Twitter dataset showed that LSA has better coherence score with hashtags than that do not incorporate hashtags. In contrast, our experiments show that the LDA has a better coherence score without incorporating hashtags. Finally, LDA has a better coherence score than LSA and the best coherence result obtained from the LDA method was (0.6047) and the LSA method was (0.4744) but the number of topics in LDA was higher than LSA. Thus, LDA may cause the same tweets to discuss the same subject set into different clustering.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1742-6588 1742-6596
DOI:	10.1088/1742-6596/1660/1/012100