Evolutionary simultaneous under and oversampling of instances for dealing with class-imbalance datasets in multilabel problems

Multilabel classification has recently attracted great attention from the data mining research community. Multilabel classification is concerned with learning where each instance can be associated with multiple classes (or labels). Class-imbalance problems appear in any classification task when the...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 159; p. 111618
Main Authors García-Pedrajas, Nicolás, Cuevas-Muñoz, José M., de Haro-García, Aida
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Multilabel classification has recently attracted great attention from the data mining research community. Multilabel classification is concerned with learning where each instance can be associated with multiple classes (or labels). Class-imbalance problems appear in any classification task when the class distribution of the instances is very different. In multilabel classification, this problem is ubiquitous, as a large percentage of labels suffer from a class-imbalanced distribution. The adaptation of single-label methods to deal with the class-imbalance problem in multilabel learning is problematic as many of their basic concepts are not easily transferred. In this paper, we propose the use of evolutionary computation to simultaneously oversample the minority class and undersample the majority class for multilabel problems. Letting the algorithm autonomously select the instances to undersample and oversample allows us to extend these two successful paradigms to the multilabel task. An extensive comparison setup of 35 datasets shows the advantages of using this approach to deal with class-imbalance datasets for multilabel problems compared with previously published methods as well as the basic classification algorithms with the original datasets. •We propose a new approach for dealing with the class-imbalance problem in multi-label datasets.•The method is based on an evolutionary simultaneous under and oversampling.•The method is scalable and achieves better results than current approaches.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2024.111618