An Imbalanced-Data Processing Algorithm for the Prediction of Heart Attack in Stroke Patients

Early predicting heart attack out of stroke patients in a view of data analysis is an approach to reduce a high mortality rate. Stroke-patient data in Intensive Care Unit are imbalanced due to that stroke patients with heart attack are in the minority of stroke patients. How to predict heart attack...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 9; pp. 25394 - 25404
Main Authors Wang, Meng, Yao, Xinghua, Chen, Yixiang
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Early predicting heart attack out of stroke patients in a view of data analysis is an approach to reduce a high mortality rate. Stroke-patient data in Intensive Care Unit are imbalanced due to that stroke patients with heart attack are in the minority of stroke patients. How to predict heart attack in the stroke-patient data becomes a challenge. For processing the imbalanced data, this paper designs an algorithm by leveraging random undersampling, clustering and oversampling techniques, which is called undersampling-clustering-oversampling algorithm (shortly, UCO algorithm). The UCO algorithm generates nearly balanced data which are utilized to train machine-learning models for predicting heart attack. Over the database of Medical Information Mart for Intensive Care III, extensive experiments are conducted to evaluate the UCO algorithm. A setting of undersampling number of 120 in the algorithm UCO, denoted UCO(120), shows good performance in helping machine-learning classifiers extract features. Five classifiers are separately deployed to predict heart attack based on outputs of the UCO(120). Our results show that random forest classifier achieves the best predicting performance with an <inline-formula> <tex-math notation="LaTeX">accuracy </tex-math></inline-formula> of 70.29%, and <inline-formula> <tex-math notation="LaTeX">precision </tex-math></inline-formula> of 70.05%. It could be well-predicted using UCO(120) and random forest that whether a stroke patient will have heart attack or not.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3057693