A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset
Heart failure is a chronic cardiac condition characterized by reduced supply of blood to the body due to impaired contractile properties of the muscles of the heart. Like any other cardiac disorder, heart failure is a serious ailment limiting the activities and curtailing the lifespan of the patient...
Saved in:
Published in | Scientific programming Vol. 2022; pp. 1 - 17 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Hindawi
09.03.2022
John Wiley & Sons, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Heart failure is a chronic cardiac condition characterized by reduced supply of blood to the body due to impaired contractile properties of the muscles of the heart. Like any other cardiac disorder, heart failure is a serious ailment limiting the activities and curtailing the lifespan of the patient, most often resulting in death sooner or later. Detection of survival of patients with heart failure is the path to effective intervention and good prognosis in terms of both treatment and quality of life of the patient. Machine learning techniques can be critical in this regard since they can be used to predict the survival of patients with heart failure in advance, allowing patients to receive appropriate treatment. Hence, six supervised machine learning algorithms have been studied and applied to analyze a dataset of 299 individuals from the UCI Machine Learning Repository and predict their survivability from heart failure. Three distinct approaches have been followed using Decision Tree Classifier, Logistic Regression, Gaussian Naïve Bayes, Random Forest Classifier, K-Nearest Neighbors, and Support Vector Machine algorithms. Data scaling has been performed as a preprocessing step utilizing the standard and min–max scaling method. However, grid search cross-validation and random search cross-validation techniques have been employed to optimize the hyperparameters. Additionally, the synthetic minority oversampling technique and edited nearest neighbor (SMOTE-ENN) data resampling technique are utilized, and the performances of all the approaches have been compared extensively. The experimental results clearly indicate that Random Forest Classifier (RFC) surpasses all other approaches with a test accuracy of 90% when used in combination with SMOTE-ENN and standard scaling technique. Therefore, this comprehensive investigation portrays a vivid visualization of the applicability and compatibility of different machine learning algorithms in such an imbalanced dataset and presents the role of the SMOTE-ENN algorithm and hyperparameter optimization for enhancing the performances of the machine learning algorithms. |
---|---|
AbstractList | Heart failure is a chronic cardiac condition characterized by reduced supply of blood to the body due to impaired contractile properties of the muscles of the heart. Like any other cardiac disorder, heart failure is a serious ailment limiting the activities and curtailing the lifespan of the patient, most often resulting in death sooner or later. Detection of survival of patients with heart failure is the path to effective intervention and good prognosis in terms of both treatment and quality of life of the patient. Machine learning techniques can be critical in this regard since they can be used to predict the survival of patients with heart failure in advance, allowing patients to receive appropriate treatment. Hence, six supervised machine learning algorithms have been studied and applied to analyze a dataset of 299 individuals from the UCI Machine Learning Repository and predict their survivability from heart failure. Three distinct approaches have been followed using Decision Tree Classifier, Logistic Regression, Gaussian Naïve Bayes, Random Forest Classifier, K-Nearest Neighbors, and Support Vector Machine algorithms. Data scaling has been performed as a preprocessing step utilizing the standard and min–max scaling method. However, grid search cross-validation and random search cross-validation techniques have been employed to optimize the hyperparameters. Additionally, the synthetic minority oversampling technique and edited nearest neighbor (SMOTE-ENN) data resampling technique are utilized, and the performances of all the approaches have been compared extensively. The experimental results clearly indicate that Random Forest Classifier (RFC) surpasses all other approaches with a test accuracy of 90% when used in combination with SMOTE-ENN and standard scaling technique. Therefore, this comprehensive investigation portrays a vivid visualization of the applicability and compatibility of different machine learning algorithms in such an imbalanced dataset and presents the role of the SMOTE-ENN algorithm and hyperparameter optimization for enhancing the performances of the machine learning algorithms. |
Author | Al-Monsur, Abdullah Nasrullah, Sarker Mohammad Jahan Ratul, Ishrak Ar-Rafi, Abrar Mohammad Reza, Md Taslim Muntasir Nishat, Mirza Khan, Md Rezaul Hoque Faisal, Fahim |
Author_xml | – sequence: 1 givenname: Mirza surname: Muntasir Nishat fullname: Muntasir Nishat, Mirza organization: Islamic University of TechnologyGazipurBangladeshiutoic-dhaka.edu – sequence: 2 givenname: Fahim orcidid: 0000-0001-9835-6299 surname: Faisal fullname: Faisal, Fahim organization: Islamic University of TechnologyGazipurBangladeshiutoic-dhaka.edu – sequence: 3 givenname: Ishrak surname: Jahan Ratul fullname: Jahan Ratul, Ishrak organization: Islamic University of TechnologyGazipurBangladeshiutoic-dhaka.edu – sequence: 4 givenname: Abdullah surname: Al-Monsur fullname: Al-Monsur, Abdullah organization: Islamic University of TechnologyGazipurBangladeshiutoic-dhaka.edu – sequence: 5 givenname: Abrar Mohammad surname: Ar-Rafi fullname: Ar-Rafi, Abrar Mohammad organization: Islamic University of TechnologyGazipurBangladeshiutoic-dhaka.edu – sequence: 6 givenname: Sarker Mohammad surname: Nasrullah fullname: Nasrullah, Sarker Mohammad organization: North South UniversityDhakaBangladeshnorthsouth.edu – sequence: 7 givenname: Md Taslim surname: Reza fullname: Reza, Md Taslim organization: Islamic University of TechnologyGazipurBangladeshiutoic-dhaka.edu – sequence: 8 givenname: Md Rezaul Hoque surname: Khan fullname: Khan, Md Rezaul Hoque organization: Islamic University of TechnologyGazipurBangladeshiutoic-dhaka.edu |
BookMark | eNp9kcFO3DAQhq2KSgXKjQcYqcc2xXbsxDmiZYGVFrZSF4lb5M2OiVHipLZ3EX3APledLqdK7cmjX9_4n5n_hBy5wSEh54x-ZUzKC045v8gLUQlavCPHTJUyq1j1eJRqKlVWcSE-kJMQnillilF6TH5dwmzoR48tumD3CAu3xxDtk452cDAYiC3CN_Rm8L12DYZJu7LGoEcX4U43rXUIS9TeWfcEs06HYI1FH-DFxha-363W82x-fw-rfRJ1P3YTt8amdfbHDkG7Ldy-juhH7XWPET2sxmh7-_MwQ3KGRb_R3WSf0OQU4VrbbucRrnTUAeNH8t7oLuDZ23tKHq7n69lttlzdLGaXy6zhRRmzLZpciQqrQhupjDCU5xtTMlOwihosjMJGUMZlWQijOCLdVFRILHNhkprnp-TT4d_RD2n2EOvnYeddsqx5IaiSgiuZqC8HqvFDCB5NPXrba_9aM1pPSdVTUvVbUgnnf-GNjX92jz6t-a-mz4emdP6tfrH_t_gN9ymoYA |
CitedBy_id | crossref_primary_10_1002_sim_10320 crossref_primary_10_1155_2022_6963891 crossref_primary_10_60084_ijma_v1i1_78 crossref_primary_10_1080_23311916_2024_2330266 crossref_primary_10_1080_19475705_2024_2314565 crossref_primary_10_1080_1206212X_2023_2262786 crossref_primary_10_1007_s11517_023_02918_8 crossref_primary_10_1016_j_rtbm_2024_101161 crossref_primary_10_60084_ijds_v1i2_123 crossref_primary_10_3389_frai_2024_1455331 crossref_primary_10_1109_ACCESS_2025_3550015 crossref_primary_10_3389_fcvm_2023_1219586 crossref_primary_10_1109_ACCESS_2023_3339225 crossref_primary_10_1109_ACCESS_2024_3358683 crossref_primary_10_1007_s12205_023_0410_8 crossref_primary_10_1109_ACCESS_2024_3446992 crossref_primary_10_7759_cureus_73876 crossref_primary_10_3389_fneur_2024_1377538 crossref_primary_10_7717_peerj_cs_2682 crossref_primary_10_32604_cmc_2023_034470 crossref_primary_10_1155_2022_9391136 |
Cites_doi | 10.1177/0165551515613226 10.1016/j.eswa.2016.03.045 10.1109/access.2021.3064084 10.11591/ijai.v10.i1.pp101-109 10.1038/nrcardio.2016.25 10.1136/hrt.2003.025254 10.1016/j.ahj.2004.08.005 10.25077/jitce.4.02.90-94.2020 10.1007/s12652-019-01652-0 10.2147/rmhp.s310295 10.1109/bracis.2019.00104 10.1038/s41598-020-62133-5 10.1016/j.ahj.2004.03.004 10.1136/hrt.2003.025270 10.1155/2019/8460934 10.1109/ACCESS.2021.3049734 10.1016/s0167-5273(01)00497-1 10.1016/j.eswa.2015.05.006 10.1016/j.cmpb.2019.05.005 10.1371/journal.pone.0181001 10.1016/j.ipm.2017.02.008 10.1016/j.eswa.2009.07.055 10.1109/icece51571.2020.9393054 10.1007/978-981-13-7279-7_3 10.1016/S0140-6736(05)66621-4 10.1080/17455030.2020.1810364 10.1109/sti50764.2020.9350440 10.1177/0165551516677911 10.1053/euhj.1999.1782 10.1001/jama.289.2.194 10.1186/s12911-020-1023-5 10.1007/978-3-319-33625-1_16 |
ContentType | Journal Article |
Copyright | Copyright © 2022 Mirza Muntasir Nishat et al. Copyright © 2022 Mirza Muntasir Nishat et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: Copyright © 2022 Mirza Muntasir Nishat et al. – notice: Copyright © 2022 Mirza Muntasir Nishat et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0 |
DBID | RHU RHW RHX AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1155/2022/3649406 |
DatabaseName | Hindawi Publishing Complete Hindawi Publishing Subscription Journals Hindawi Publishing Open Access CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | CrossRef Technology Research Database |
Database_xml | – sequence: 1 dbid: RHX name: Hindawi Publishing Open Access url: http://www.hindawi.com/journals/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1875-919X |
Editor | Zhao, Qianchuan |
Editor_xml | – sequence: 1 givenname: Qianchuan surname: Zhao fullname: Zhao, Qianchuan |
EndPage | 17 |
ExternalDocumentID | 10_1155_2022_3649406 |
GroupedDBID | .DC 0R~ 24P 4.4 5VS AAFWJ AAJEY ABJNI ACCMX ACGFS ADBBV AENEX ALMA_UNASSIGNED_HOLDINGS ASPBG AVWKF BCNDV DU5 EBS EST ESX H13 HZ~ IOS KQ8 MIO MV1 NGNOM O9- OK1 RHU RHW RHX AAYXX CITATION 7SC 7SP 8FD AAMMB AEFGJ AGXDD AIDQK AIDYY JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c267t-def3849e96af58f4f023bf71f6190fe6f8ec40125764f82ee0b9045e734f12533 |
IEDL.DBID | RHX |
ISSN | 1058-9244 |
IngestDate | Fri Jul 25 09:32:37 EDT 2025 Tue Jul 01 02:50:10 EDT 2025 Thu Apr 24 22:51:24 EDT 2025 Wed Apr 16 06:25:23 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c267t-def3849e96af58f4f023bf71f6190fe6f8ec40125764f82ee0b9045e734f12533 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-9835-6299 |
OpenAccessLink | https://dx.doi.org/10.1155/2022/3649406 |
PQID | 2640854285 |
PQPubID | 2046410 |
PageCount | 17 |
ParticipantIDs | proquest_journals_2640854285 crossref_primary_10_1155_2022_3649406 crossref_citationtrail_10_1155_2022_3649406 hindawi_primary_10_1155_2022_3649406 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2022-03-09 |
PublicationDateYYYYMMDD | 2022-03-09 |
PublicationDate_xml | – month: 03 year: 2022 text: 2022-03-09 day: 09 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | Scientific programming |
PublicationYear | 2022 |
Publisher | Hindawi John Wiley & Sons, Inc |
Publisher_xml | – name: Hindawi – name: John Wiley & Sons, Inc |
References | 44 45 48 49 S. Rahayu (36) 2020; 16 Nhlbi Nih (2) 2021 P. E. Rubini (34) 2021; 25 J. J. V. McMurray (11) 2005; 365 M. M. Nishat (28) 2021; 21 UCI Machine Learning Repository (38) 2020 10 12 13 14 15 M. A. A. R. Asif (18) 2021; 29 16 17 19 F. Shamsham (7) 2000; 61 3 4 6 9 R. Gürfidan (32) 2021 M. V Sonth (47) 2020; 63 20 21 S. Ambesange (46) 22 24 25 26 27 29 World Health Organization (1) 2021 J. Bergstra (43) 2012; 13 L. Ali (37) 2019; 2019 H. J. P. Weerts (42) 2020 I. Babaoǧlu (23) 2010; 37 J. Grus (39) 2015 30 31 33 35 E. Tanai (8) 2016; 6 American Heart Association (5) 2020 40 41 |
References_xml | – volume-title: Classification of Death Related to Heart Failure by Machine Learning Algorithms year: 2021 ident: 32 – ident: 20 doi: 10.1177/0165551515613226 – ident: 24 doi: 10.1016/j.eswa.2016.03.045 – ident: 35 doi: 10.1109/access.2021.3064084 – volume: 6 start-page: 187 issue: 1 year: 2016 ident: 8 article-title: Pathophysiology of heart failure publication-title: Comprehensive Physiology – ident: 33 doi: 10.11591/ijai.v10.i1.pp101-109 – volume: 29 start-page: 731 issue: 2 year: 2021 ident: 18 article-title: Performance evaluation and comparative analysis of different machine learning algorithms in predicting cardiovascular disease publication-title: Engineering Letters – year: 2020 ident: 38 article-title: Heart failure clinical records Data Set – ident: 14 doi: 10.1038/nrcardio.2016.25 – ident: 12 doi: 10.1136/hrt.2003.025254 – ident: 3 doi: 10.1016/j.ahj.2004.08.005 – ident: 31 doi: 10.25077/jitce.4.02.90-94.2020 – ident: 16 doi: 10.1007/s12652-019-01652-0 – ident: 49 doi: 10.2147/rmhp.s310295 – year: 2020 ident: 5 article-title: What is heart failure? – ident: 45 doi: 10.1109/bracis.2019.00104 – ident: 19 doi: 10.1038/s41598-020-62133-5 – ident: 4 doi: 10.1016/j.ahj.2004.03.004 – volume-title: Data Science from Scratch year: 2015 ident: 39 – ident: 6 doi: 10.1136/hrt.2003.025270 – year: 2020 ident: 42 article-title: Importance of Tuning Hyperparameters of Machine Learning Algorithms – ident: 41 doi: 10.1155/2019/8460934 – ident: 27 doi: 10.1109/ACCESS.2021.3049734 – year: 2021 ident: 1 article-title: Cardiovascular diseases (CVDs) – volume: 13 start-page: 281 year: 2012 ident: 43 article-title: Random search for hyper-parameter optimization publication-title: Journal of Machine Learning Research – ident: 9 doi: 10.1016/s0167-5273(01)00497-1 – ident: 15 doi: 10.1016/j.eswa.2015.05.006 – ident: 44 doi: 10.1016/j.cmpb.2019.05.005 – ident: 29 doi: 10.1371/journal.pone.0181001 – ident: 26 doi: 10.1016/j.ipm.2017.02.008 – start-page: 827 ident: 46 article-title: Multiple heart diseases prediction using logistic regression with ensemble and hyper parameter tuning techniques – volume: 37 start-page: 2182 issue: 3 year: 2010 ident: 23 article-title: Effects of principle component analysis on assessment of coronary artery diseases using support vector machine publication-title: Expert Systems with Applications doi: 10.1016/j.eswa.2009.07.055 – ident: 25 doi: 10.1109/icece51571.2020.9393054 – ident: 40 doi: 10.1007/978-981-13-7279-7_3 – volume: 63 start-page: 3961 issue: 5 year: 2020 ident: 47 article-title: Optimization of random forest algorithm with ensemble and hyper parameter tuning techniques for Multiple heart diseases publication-title: Solid State Technology – volume: 365 start-page: 1877 issue: 9474 year: 2005 ident: 11 article-title: Heart failure publication-title: Lancet doi: 10.1016/S0140-6736(05)66621-4 – volume: 16 start-page: 255 issue: 2 year: 2020 ident: 36 article-title: Prediction of survival of heart failure patients using random forest publication-title: Journal of Pilar Nusa Mandiri – ident: 48 doi: 10.1080/17455030.2020.1810364 – volume: 2019 year: 2019 ident: 37 article-title: A feature-driven decision support system for heart failure prediction based on statistical model and Gaussian naive bayes publication-title: Computational and Mathematical Methods in Medicine – ident: 21 doi: 10.1109/sti50764.2020.9350440 – ident: 22 doi: 10.1177/0165551516677911 – ident: 13 doi: 10.1053/euhj.1999.1782 – volume: 61 start-page: 1319 issue: 5 year: 2000 ident: 7 article-title: Essentials of the diagnosis of heart failure publication-title: American Family Physician – ident: 10 doi: 10.1001/jama.289.2.194 – year: 2021 ident: 2 article-title: Heart failure – ident: 30 doi: 10.1186/s12911-020-1023-5 – ident: 17 doi: 10.1007/978-3-319-33625-1_16 – volume: 21 start-page: e1 issue: 29 year: 2021 ident: 28 article-title: A comprehensive analysis on detecting chronic Kidney disease by employing machine learning algorithms publication-title: EAI Endorsed Transactions on Pervasive Health and Technology – volume: 25 start-page: 904 issue: 2 year: 2021 ident: 34 article-title: A cardiovascular disease prediction using machine learning algorithms publication-title: Annals of the Romanian Society for Cell Biology |
SSID | ssj0018100 |
Score | 2.4728758 |
Snippet | Heart failure is a chronic cardiac condition characterized by reduced supply of blood to the body due to impaired contractile properties of the muscles of the... |
SourceID | proquest crossref hindawi |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 1 |
SubjectTerms | Accuracy Age Algorithms Cardiovascular disease Classification Classifiers Comparative analysis Creatinine Data mining Datasets Decision trees Ejection fraction Family medical history Heart failure Machine learning Medical prognosis Muscles Optimization Oversampling Patients Resampling Risk factors Scaling Support vector machines Survivability Survival |
Title | A Comprehensive Investigation of the Performances of Different Machine Learning Classifiers with SMOTE-ENN Oversampling Technique and Hyperparameter Optimization for Imbalanced Heart Failure Dataset |
URI | https://dx.doi.org/10.1155/2022/3649406 https://www.proquest.com/docview/2640854285 |
Volume | 2022 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA5WELz4Ft_MQU-yuN1N9nEUbalCW8EWeluSNLEFu5V2xX_o73Jmm_WJ6HHD7OYwk8x8my_fMHbKfau41soTQkmPB0p4qUKHqAiLo0gmQgu6KNzuRK0-vx2IgRNJmv88wsdsR_A8uAgjnnKS1q5hgBEobw3eDwuSur8QHRC4djFdVfz2b-9-yTwrI4K8L-MfW3CZV5obbM0VhHC58OAmWzL5Fluvmi2AW3vb7PUSaHBmRgvKOXxSyJjmMLWApRzcfdwDmNPYtet_UkC7ZE0acIKqD1C2wxxbaoUN9DcW7tvdXsNrdDrQJa6GJK452vUqlVeQ-RBaiFtnpBc-IR4NdHHHmbirnIAzw81EEVlSGzTFmQpoyjEx3-FaFpgwix3WbzZ6Vy3PNWHwdBDFhTc0Nkx4atJIWpFYbjHJKxvXLSIv35rIJkYjRiPcwm0SGOOrFMtEE4fc4mgY7rLlfJqbPQbKT6RNjW_10OfcpDKMdUTHPqH0QxnzfXZeOSjTTqGcGmU8ZiVSESIjd2bOnfvs7N36aaHM8YvdqfP1H2ZHVSBkbhnPM6wWsSRFhCYO_veVQ7ZKjyVJLT1iy8Xs2Rxj1VKoE1YL-N1JGblv1LfoCQ |
linkProvider | Hindawi Publishing |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Comprehensive+Investigation+of+the+Performances+of+Different+Machine+Learning+Classifiers+with+SMOTE-ENN+Oversampling+Technique+and+Hyperparameter+Optimization+for+Imbalanced+Heart+Failure+Dataset&rft.jtitle=Scientific+programming&rft.au=Muntasir+Nishat%2C+Mirza&rft.au=Faisal%2C+Fahim&rft.au=Jahan+Ratul%2C+Ishrak&rft.au=Al-Monsur%2C+Abdullah&rft.date=2022-03-09&rft.issn=1058-9244&rft.eissn=1875-919X&rft.volume=2022&rft.spage=1&rft.epage=17&rft_id=info:doi/10.1155%2F2022%2F3649406&rft.externalDBID=n%2Fa&rft.externalDocID=10_1155_2022_3649406 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1058-9244&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1058-9244&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1058-9244&client=summon |