Data preprocessing for heart disease classification: A systematic literature review

•A systematic review on the use of data preprocessing techniques for heart disease classification purpose was conducted.•A total of 49 studies published between January 2000 and June 2019 were selected and analyzed considering four review questions.•A significant number of selected studies were devo...

Full description

Saved in:
Bibliographic Details
Published inComputer methods and programs in biomedicine Vol. 195; p. 105635
Main Authors Benhar, H., Idri, A., Fernández-Alemán, J.L.
Format Journal Article
LanguageEnglish
Published Ireland Elsevier B.V 01.10.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A systematic review on the use of data preprocessing techniques for heart disease classification purpose was conducted.•A total of 49 studies published between January 2000 and June 2019 were selected and analyzed considering four review questions.•A significant number of selected studies were devoted to data reduction task. Feature•selection was the most used data reduction sub-task. Moreover, Data cleaning was a common task in classification for cardiology and it dealt mainly with missing values and noise elimination in cardiac datasets.•In general, preprocessing either maintained or improved the performance of heart disease•Researchers concentrated more on improving the accuracy rate of the models developed while neglecting other aspects such as time complexity and comprehensibility. Early detection of heart disease is an important challenge since 17.3 million people yearly lose their lives due to heart diseases. Besides, any error in diagnosis of cardiac disease can be dangerous and risks an individual's life. Accurate diagnosis is therefore critical in cardiology. Data Mining (DM) classification techniques have been used to diagnosis heart diseases but still limited by some challenges of data quality such as inconsistencies, noise, missing data, outliers, high dimensionality and imbalanced data. Data preprocessing (DP) techniques were therefore used to prepare data with the goal of improving the performance of heart disease DM based prediction systems. The purpose of this study is to review and summarize the current evidence on the use of preprocessing techniques in heart disease classification as regards: (1) the DP tasks and techniques most frequently used, (2) the impact of DP tasks and techniques on the performance of classification in cardiology, (3) the overall performance of classifiers when using DP techniques, and (4) comparisons of different combinations classifier-preprocessing in terms of accuracy rate. A systematic literature review is carried out, by identifying and analyzing empirical studies on the application of data preprocessing in heart disease classification published in the period between January 2000 and June 2019. A total of 49 studies were therefore selected and analyzed according to the aforementioned criteria. The review results show that data reduction is the most used preprocessing task in cardiology, followed by data cleaning. In general, preprocessing either maintained or improved the performance of heart disease classifiers. Some combinations such as (ANN + PCA), (ANN + CHI) and (SVM + PCA) are promising terms of accuracy. However the deployment of these models in real-world diagnosis decision support systems is subject to several risks and limitations due to the lack of interpretation.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
ObjectType-Review-3
content type line 23
ObjectType-Undefined-4
ISSN:0169-2607
1872-7565
1872-7565
DOI:10.1016/j.cmpb.2020.105635