Data preprocessing for heart disease classification: A systematic literature review
•A systematic review on the use of data preprocessing techniques for heart disease classification purpose was conducted.•A total of 49 studies published between January 2000 and June 2019 were selected and analyzed considering four review questions.•A significant number of selected studies were devo...
Saved in:
Published in | Computer methods and programs in biomedicine Vol. 195; p. 105635 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Ireland
Elsevier B.V
01.10.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •A systematic review on the use of data preprocessing techniques for heart disease classification purpose was conducted.•A total of 49 studies published between January 2000 and June 2019 were selected and analyzed considering four review questions.•A significant number of selected studies were devoted to data reduction task. Feature•selection was the most used data reduction sub-task. Moreover, Data cleaning was a common task in classification for cardiology and it dealt mainly with missing values and noise elimination in cardiac datasets.•In general, preprocessing either maintained or improved the performance of heart disease•Researchers concentrated more on improving the accuracy rate of the models developed while neglecting other aspects such as time complexity and comprehensibility.
Early detection of heart disease is an important challenge since 17.3 million people yearly lose their lives due to heart diseases. Besides, any error in diagnosis of cardiac disease can be dangerous and risks an individual's life. Accurate diagnosis is therefore critical in cardiology. Data Mining (DM) classification techniques have been used to diagnosis heart diseases but still limited by some challenges of data quality such as inconsistencies, noise, missing data, outliers, high dimensionality and imbalanced data. Data preprocessing (DP) techniques were therefore used to prepare data with the goal of improving the performance of heart disease DM based prediction systems.
The purpose of this study is to review and summarize the current evidence on the use of preprocessing techniques in heart disease classification as regards: (1) the DP tasks and techniques most frequently used, (2) the impact of DP tasks and techniques on the performance of classification in cardiology, (3) the overall performance of classifiers when using DP techniques, and (4) comparisons of different combinations classifier-preprocessing in terms of accuracy rate.
A systematic literature review is carried out, by identifying and analyzing empirical studies on the application of data preprocessing in heart disease classification published in the period between January 2000 and June 2019. A total of 49 studies were therefore selected and analyzed according to the aforementioned criteria.
The review results show that data reduction is the most used preprocessing task in cardiology, followed by data cleaning. In general, preprocessing either maintained or improved the performance of heart disease classifiers. Some combinations such as (ANN + PCA), (ANN + CHI) and (SVM + PCA) are promising terms of accuracy. However the deployment of these models in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of interpretation. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 ObjectType-Review-3 content type line 23 ObjectType-Undefined-4 |
ISSN: | 0169-2607 1872-7565 1872-7565 |
DOI: | 10.1016/j.cmpb.2020.105635 |