Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being dis...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in artificial intelligence Vol. 3; p. 42
Main Authors	Albalawi, Rania, Yeap, Tet Hin, Benyoucef, Morad
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media S.A 14.07.2020
Subjects	Artificial Intelligence natural language processing online social networks short text topic modeling user-generated content topic modeling natural language processing user-generated content short text online social networks
Online Access	Get full text
ISSN	2624-8212 2624-8212
DOI	10.3389/frai.2020.00042

Cover

Loading…

More Information
Summary:	With the growth of online social network platforms and applications, large amounts of textual user-generated content are created daily in the form of comments, reviews, and short-text messages. As a result, users often find it challenging to discover useful information or more on the topic being discussed from such content. Machine learning and natural language processing algorithms are used to analyze the massive amount of textual social media data available online, including topic modeling techniques that have gained popularity in recent years. This paper investigates the topic modeling subject and its common application areas, methods, and tools. Also, we examine and compare five frequently used topic modeling methods, as applied to short textual social data, to show their benefits practically in detecting important topics. These methods are latent semantic analysis, latent Dirichlet allocation, non-negative matrix factorization, random projection, and principal component analysis. Two textual datasets were selected to evaluate the performance of included topic modeling methods based on the topic quality and some standard statistical evaluation metrics, like recall, precision, -score, and topic coherence. As a result, latent Dirichlet allocation and non-negative matrix factorization methods delivered more meaningful extracted topics and obtained good results. The paper sheds light on some common topic modeling methods in a short-text context and provides direction for researchers who seek to apply these methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Edited by: Anis Yazidi, OsloMet—Oslo Metropolitan University, Norway Reviewed by: Lei Jiao, University of Agder, Norway; Ashish Rauniyar, University of Oslo, Norway, in Collaboration With Reviewer LJ; Imen Ben Sassi, Tallinn University of Technology, Estonia; Desta Haileselassie Hagos, Oslo Metropolitan University, Norway, in Collaboration With Reviewer IS This article was submitted to Machine Learning and Artificial Intelligence, a section of the journal Frontiers in Artificial Intelligence
ISSN:	2624-8212 2624-8212
DOI:	10.3389/frai.2020.00042