Integration of aggressive bound tightening and Mixed Integer Programming for Cost-sensitive feature selection in medical diagnosis

Silent diseases is an umbrella term that captures a spectrum of chronic illnesses that produce no clinically obvious signs and are diagnosed at advanced stages when the damage is irreversible. Current diagnostic strategies of silent diseases depend on self-reported symptoms and observed behavior thr...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 187; p. 115902
Main Authors Abdulla, Mai, Khasawneh, Mohammad T.
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 01.01.2022
Elsevier BV
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Silent diseases is an umbrella term that captures a spectrum of chronic illnesses that produce no clinically obvious signs and are diagnosed at advanced stages when the damage is irreversible. Current diagnostic strategies of silent diseases depend on self-reported symptoms and observed behavior through extended periods of time, and until now there are no specific clinical tests to diagnose silent diseases. Scientific research suggests the importance of early diagnosis to restore the functionality and reduce diseases-related complications. Previous studies primarily focused on feature selection methods to aid in medical diagnosis. Traditional feature selection methods are primarily focused on correct classification and often ignore features’ costs; the cost of clinical tests required to acquire the feature value. However, in medical diagnosis, features have different associated costs. Because ignoring features’ costs may result in a high cost diagnostic strategy that cannot be used in practice, developing a low-cost diagnostic strategy remains a subject of much interest. In this paper, new Mixed Integer Programming (MIP) models, namely, Cost-sensitive Support Vector Machine (CS-SVM) and Cost-sensitive Multi-surface Method Tree (CS-MSMT) that allow for simultaneous selection of low-cost and informative features are proposed. The CS-SVM and CS-MSMT are superior because they have the ability to account for shared costs. The CS-SVM and CS-MSMT were modified to embed shared costs across feature groups, and are termed Discounted CS-SVM (dCS-SVM) and Discounted CS-MSMT (dCS-MSMT), respectively. Computationally effective algorithm that integrates aggressive bound tightening with the MIP formulation is proposed. To demonstrate the effectiveness of the proposed models, different analysis paradigms are conducted on six UCI medical datasets; Chronic Kidney Disease, Hepatitis, Heart Disease, Thyroid, Diabetes and Leukemia. The results demonstrate the efficiency and robustness of the CS-SVM and CS-MSMT (and consequently the dCS-SVM and dCS-MSMT) under various conditions. The CS-SVM and CS-MSMT improved accuracy by 10.3% and 3.4% and reduced costs by 94.3% and 72.4% in the leukemia dataset, respectively. •New MIP models for cost-sensitive feature selection are proposed.•The models are robust enough to account for shared cost across feature groups.•Aggressive bound tightening within Branch and Cut algorithm was used.•The proposed cost-sensitive feature selection models outperformed existing feature selection techniques.•The models improved the accuracy up by 10.3% and decreased the cost up to 96%.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.115902