Leveraging textual information for social media news categorization and sentiment analysis

The rise of social media has changed how people view connections. Machine Learning (ML)-based sentiment analysis and news categorization help understand emotions and access news. However, most studies focus on complex models requiring heavy resources and slowing inference times, making deployment di...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 19; no. 7; p. e0307027
Main Authors	Hasan, Mahmudul, Ahmed, Tanver, Islam, Md Rashedul, Uddin, Md Palash
Format	Journal Article
Language	English
Published	United States Public Library of Science 15.07.2024 Public Library of Science (PLoS)
Subjects	Accuracy Algorithms Classification Clustering Computational linguistics Computer and Information Sciences Current events Data mining Datasets Decision making Decision trees Digital media Effectiveness Emotions Engineering and Technology Humans Influence Information management Language processing Machine Learning Natural language interfaces Neural networks News Physical Sciences Regularization Research and Analysis Methods Semantics Sentiment analysis Social discrimination learning Social Media Social networks Social Sciences Stochasticity Strings Support vector machines Trends Unstructured data User behavior User generated content Bangladesh Canada
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The rise of social media has changed how people view connections. Machine Learning (ML)-based sentiment analysis and news categorization help understand emotions and access news. However, most studies focus on complex models requiring heavy resources and slowing inference times, making deployment difficult in resource-limited environments. In this paper, we process both structured and unstructured data, determining the polarity of text using the TextBlob scheme to determine the sentiment of news headlines. We propose a Stochastic Gradient Descent (SGD)-based Ridge classifier (RC) for blending SGDR with an advanced string processing technique to effectively classify news articles. Additionally, we explore existing supervised and unsupervised ML algorithms to gauge the effectiveness of our SGDR classifier. The scalability and generalization capability of SGD and L2 regularization techniques in RCs to handle overfitting and balance bias and variance provide the proposed SGDR with better classification capability. Experimental results highlight that our string processing pipeline significantly boosts the performance of all ML models. Notably, our ensemble SGDR classifier surpasses all state-of-the-art ML algorithms, achieving an impressive 98.12% accuracy. McNemar's significance tests reveal that our SGDR classifier achieves a 1% significance level improvement over K-Nearest Neighbor, Decision Tree, and AdaBoost and a 5% significance level improvement over other algorithms. These findings underscore the superior proficiency of linear models in news categorization compared to tree-based and nonlinear counterparts. This study contributes valuable insights into the efficacy of the proposed methodology, elucidating its potential for news categorization and sentiment analysis.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: Authors have no conflict of interest to declare.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0307027