CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection
Due to the global financial crisis occurred in 2008, with a large amount of companies troubling in financial distress, the machine learning-based prediction of this dilemma has shown economic stakeholders’ great practicability. In the field of machine learning, most of the previous studies only focu...
Saved in:
Published in | Applied soft computing Vol. 97; p. 106758 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.12.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Due to the global financial crisis occurred in 2008, with a large amount of companies troubling in financial distress, the machine learning-based prediction of this dilemma has shown economic stakeholders’ great practicability. In the field of machine learning, most of the previous studies only focus on the improvement of the imbalanced datasets sampling methods or the introduction of multiple classifiers in the constructing stage for prediction model. In view of this, this paper attempts to improve the scope and depth of ensemble to achieve better prediction performance for a severely imbalanced dataset of financial data of Chinese listed companies. For the first time, this paper combines the clustering-based under-sampling (CUS) with the gradient boosting decision tree (GBDT) to construct the model, which is used along with the current widely used extreme gradient boosting (XGBoost) as heterogeneous classifier to build heterogeneous ensemble in financial distress prediction. In addition, based on the idea of ensemble, this paper uses five feature selection methods based on different theoretical backgrounds to select features, and introduces ensemble from the whole process of feature selection, data preprocessing and model construction. In the comparative experience, the method proposed by us achieves the best performance on the test set. This study demonstrates the broad application of CUS for financial data processing and the superior generalization performance of the ensemble model relative to individual learners.
•We propose a CUS-heterogeneous ensemble-based financial distress prediction model.•We construct a new feature set in an ensembled manner through five feature selection methods.•We present a novel combination approach of GBDT and CUS to handle severely imbalanced financial data.•We found the proposed approach is more accurate and efficient than benchmark models. |
---|---|
ISSN: | 1568-4946 1872-9681 |
DOI: | 10.1016/j.asoc.2020.106758 |