Class imbalance problem in short-term solar flare prediction

Using data-driven algorithms to accurately forecast solar flares requires reliable data sets. The solar flare dataset is composed of many non-flaring samples with a small percentage of flaring samples. This is called the class imbalance problem in data mining tasks. The prediction model is sensitive...

Full description

Saved in:

Bibliographic Details
Published in	Research in astronomy and astrophysics Vol. 21; no. 9; pp. 237 - 236
Main Authors	Wan, Jie, Fu, Jun-Feng, Liu, Jin-Fu, Shi, Jia-Kui, Jin, Cheng-Gang, Zhang, Huai-Peng
Format	Journal Article
Language	English
Published	Beijing National Astronomical Observatories, CAS and IOP Publishing Ltd 01.11.2021 IOP Publishing School of Electrical Engineering and Automation,Harbin Institute of Technology,Harbin 150001,China%School of Electrical Engineering and Automation,Harbin Institute of Technology,Harbin 150001,China%School of Energy Science and Engineering,Harbin Institute of Technology,Harbin 150001,China Laboratory for Space Environment and Physical Sciences,Harbin Institute of Technology,Harbin 150001,China
Subjects	Algorithms Categories Data mining Datasets Decision making gamma rays methods: data analysis Oversampling Prediction models Resampling Solar flares Sun: flares Sun: magnetic fields Sun: sunspots Sun: X-rays The Sun Training Sun:X-rays,gamma rays Sun:flares Sun:magnetic fields The Sun Sun:sunspots methods:data analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Using data-driven algorithms to accurately forecast solar flares requires reliable data sets. The solar flare dataset is composed of many non-flaring samples with a small percentage of flaring samples. This is called the class imbalance problem in data mining tasks. The prediction model is sensitive to most classes of the original data set during training. Therefore, the class imbalance problem for building up the flare prediction model from observational data should be systematically discussed. Aiming at the problem of class imbalance, three strategies are proposed corresponding to the data set, loss function, and training process: Type I resamples the training samples, including oversampling for the minority class, undersampling, or mixed sampling for the majority class. Type II usually changes the decision-making boundary, assigning the majority and minority categories of prediction loss to different weights. Type III assigns different weights to the training samples, the majority categories are assigned smaller weights, and the minority categories are assigned larger weights to improve the training process of the prediction model. The main work of this paper compares these imbalance processing methods when building a flare prediction model and tries to find the optimal strategy. Our results show that among these strategies, the performance of oversampling and sample weighting is better than other strategies in most parameters, and the generality of resampling and changing the decision boundary is better.
ISSN:	1674-4527 2397-6209
DOI:	10.1088/1674-4527/21/9/237