Handling Imbalanced Data in Predictive Maintenance: A Resampling-Based Approach

Imbalanced data is a common problem in many areas, and it can have significant impacts on the performance and generalizability of machine learning models. This is because the models fail to create a good representation of the examples in the minority class. This study aims at improving the classific...

Full description

Saved in:

Bibliographic Details
Published in	2023 5th International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA) pp. 1 - 6
Main Authors	Cicak, Sejma, Avci, Umut
Format	Conference Proceeding
Language	English
Published	IEEE 08.06.2023
Subjects	classification Data models Human computer interaction imbalanced data Machine learning Machine learning algorithms Prediction algorithms predictive maintenance Robots Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Imbalanced data is a common problem in many areas, and it can have significant impacts on the performance and generalizability of machine learning models. This is because the models fail to create a good representation of the examples in the minority class. This study aims at improving the classification success for the predictive maintenance tasks in which the data is generally imbalanced. To this end, we use resampling methods that target creating balanced data. We present various oversampling and undersampling techniques and apply them to both synthetic and real-world datasets. We then perform classification experiments with imbalanced and balanced datasets by using different classifiers. The performances of different classifiers have been compared. More importantly, we evaluate the effectiveness of resampling techniques to provide insights into their usefulness in handling class imbalance. Our study contributes to the growing body of literature on addressing the class imbalance in classification tasks and provides practical guidance for selecting appropriate sampling methods based on the characteristics of the dataset.
DOI:	10.1109/HORA58378.2023.10156799