A virtual multi-label approach to imbalanced data classification

One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Tradi...

Full description

Saved in:

Bibliographic Details
Published in	Communications in statistics. Simulation and computation Vol. 53; no. 3; pp. 1461 - 1471
Main Authors	Chou, Elizabeth P., Yang, Shan-Ping
Format	Journal Article
Language	English
Published	Philadelphia Taylor & Francis 03.03.2024 Taylor & Francis Ltd
Subjects	Algorithms Classification Classifiers Cluster analysis Clustering Data analysis Equal k-means Imbalance Labels Machine learning Oversampling Support vector machines Vector quantization Virtual multi-label
Online Access	Get full text

Cover

Loading…

More Information
Summary:	One of the most challenging issues in machine learning is imbalanced data analysis. Usually, in this type of research, correctly predicting minority labels is more critical than correctly predicting majority labels. However, traditional machine learning techniques easily lead to learning bias. Traditional classifiers tend to place all subjects in the majority group, resulting in biased predictions. Machine learning studies are typically conducted from one of two perspectives: a data-based perspective or a model-based perspective. Oversampling and undersampling are examples of data-based approaches, while the addition of costs, penalties, or weights to optimize the algorithm is typical of a model-based approach. Some ensemble methods have been studied recently. These methods cause various problems, such as overfitting, the omission of some information, and long computation times. In addition, these methods do not apply to all kinds of datasets. Based on this problem, the virtual labels (ViLa) approach for the majority label is proposed to solve the imbalanced problem. A new multiclass classification approach with the equal K-means clustering method is demonstrated in the study. The proposed method is compared with commonly used imbalance problem methods, such as sampling methods (oversampling, undersampling, and SMOTE) and classifier methods (SVM and one-class SVM). The results show that the proposed method performs better when the degree of data imbalance increases and will gradually outperform other methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0361-0918 1532-4141
DOI:	10.1080/03610918.2022.2049820