Enhancing soft computing techniques to actively address imbalanced regression problems

While research in the area of imbalance, which is understood as classes that are not equally represented, is mainly addressed in classification, it has hardly been studied in regression, where data maldistribution, or imbalance, can be defined as the existence of some specific subdomains of the outp...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 234; p. 121011
Main Authors Arteaga, María, Gacto, María José, Galende, Marta, Alcalá-Fdez, Jesús, Alcalá, Rafael
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 30.12.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:While research in the area of imbalance, which is understood as classes that are not equally represented, is mainly addressed in classification, it has hardly been studied in regression, where data maldistribution, or imbalance, can be defined as the existence of some specific subdomains of the output variable misrepresented in the training data set, resulting in low accuracy for new instances within these subdomains. The small amount of state-of-the-art techniques are “passive”, meaning they are only applied in preprocessing. In this contribution, we propose two new specific evolutionary algorithms based on fuzzy rules to “actively” address imbalanced regression problems and improve the overall performance of the algorithms instead of just addressing the imbalance problem. The results obtained after applying statistical tests to 32 regression datasets that handle more than 3000 partitions show the effectiveness of the proposed methods when compared to the best previous proposal, a passive method called SMOGN. We can conclude: (1) we cannot affirm, since the equality hypotheses have not been rejected, that there are significant performance differences between using stratified and non-stratified data, thus we will use stratification to preserve a minimum representation of the minority set, (2) both fuzzy rule-based methods obtain better results in terms of the imbalance metric when using SMOGN, but in both methods this incurs a cost in accuracy (with confidence scores of over 99.0%), and (3) the proposed methods outperform those using SMOGN as they get slightly better results in the imbalance metric, with better average ranks in both proposals, and obtain significantly better results in global accuracy, that is, all the performance metrics studied improve statistically with a confidence score of over 99.0%, with the exception of one metric, which scores above 90.0%. •Focussing on imbalanced “Regression” problems (not Classification).•Imbalance is rarely addressed in the current literature for continuous prediction.•Analyzing the evolution of the few metrics and techniques currently published.•Proposing two new specific evolutionary methods to actively address this problem.•Statistically outperforming the state-of-the-art SMOGN in accuracy and F1 metric.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.121011