Handling Inter-class and Intra-class Imbalance in Class-imbalanced Learning
Class-imbalance is a common problem in machine learning practice. Typical Imbalanced Learning (IL) methods balance the data via intuitive class-wise resampling or reweighting. However, previous studies suggest that beyond class-imbalance, intrinsic data difficulty factors like overlapping, noise, an...
Saved in:
Main Authors | , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
24.11.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Class-imbalance is a common problem in machine learning practice. Typical
Imbalanced Learning (IL) methods balance the data via intuitive class-wise
resampling or reweighting. However, previous studies suggest that beyond
class-imbalance, intrinsic data difficulty factors like overlapping, noise, and
small disjuncts also play critical roles. To handle them, many solutions have
been proposed (e.g., noise removal, borderline sampling, hard example mining)
but are still confined to a specific factor and cannot generalize to broader
scenarios, which raises an interesting question: how to handle both
class-agnostic difficulties and the class-imbalance in a unified way? To answer
this, we consider both class-imbalance and its orthogonal: intra-class
imbalance, i.e., the imbalanced distribution over easy and hard samples. Such
distribution naturally reflects the complex influence of class-agnostic
intrinsic data difficulties thus providing a new unified view for identifying
and handling these factors during learning. From this perspective, we discuss
the pros and cons of existing IL solutions and further propose new balancing
techniques for more robust and efficient IL. Finally, we wrap up all solutions
into a generic ensemble IL framework, namely DuBE (Duple-Balanced Ensemble). It
features explicit and efficient inter-\&intra-class balancing as well as easy
extension with standardized APIs. Extensive experiments validate the
effectiveness of DuBE. Code, examples, and documentation are available at
https://github.com/AnonAuthorAI/duplebalance and
https://duplebalance.readthedocs.io. |
---|---|
DOI: | 10.48550/arxiv.2111.12791 |