Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problems
Multilabel classification has recently attracted great attention from the data mining research community. Multilabel classification is concerned with learning where each instance can be associated with multiple classes (or labels). Class-imbalance problems appear in any classification task when the...
Saved in:
Published in | Applied soft computing Vol. 159; p. 111618 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Multilabel classification has recently attracted great attention from the data mining research community. Multilabel classification is concerned with learning where each instance can be associated with multiple classes (or labels). Class-imbalance problems appear in any classification task when the class distribution of the instances is very different. In multilabel classification, this problem is ubiquitous, as a large percentage of labels suffer from a class-imbalanced distribution. The adaptation of single-label methods to deal with the class-imbalance problem in multilabel learning is problematic as many of their basic concepts are not easily transferred. In this paper, we propose the use of evolutionary computation to simultaneously oversample the minority class and undersample the majority class for multilabel problems. Letting the algorithm autonomously select the instances to undersample and oversample allows us to extend these two successful paradigms to the multilabel task. An extensive comparison setup of 35 datasets shows the advantages of using this approach to deal with class-imbalance datasets for multilabel problems compared with previously published methods as well as the basic classification algorithms with the original datasets.
•We propose a new approach for dealing with the class-imbalance problem in multi-label datasets.•The method is based on an evolutionary simultaneous under and oversampling.•The method is scalable and achieves better results than current approaches. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2024.111618 |