Two-Stage Dimensionality Reduction for Social Media Engagement Classification

The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the cu...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 14; no. 3; p. 1269
Main Authors Vieira Sobrinho, Jose Luis, Teles Vieira, Flavio Henrique, Assis Cardoso, Alisson
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification.
ISSN:2076-3417
2076-3417
DOI:10.3390/app14031269