A Genetic Programming Approach to Radiomic-Based Feature Construction for Survival Prediction in Non-Small Cell Lung Cancer

Machine learning (ML) is commonly used to develop survival-predictive radiomic models in non-small cell lung cancer (NSCLC) patients, which helps assist treatment decision making. Radiomic features derived from computer tomography (CT) lung images aim to capture quantitative tumor characteristics. H...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 14; no. 16; p. 6923
Main Authors Scalco, Elisa, Gómez-Flores, Wilfrido, Rizzo, Giovanna
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.08.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Machine learning (ML) is commonly used to develop survival-predictive radiomic models in non-small cell lung cancer (NSCLC) patients, which helps assist treatment decision making. Radiomic features derived from computer tomography (CT) lung images aim to capture quantitative tumor characteristics. However, these features are determined by humans, which poses a risk of including irrelevant or redundant variables, thus reducing the model’s generalization. To address this issue, we propose using genetic programming (GP) to automatically construct new features with higher discriminant power than the original radiomic features. To achieve this goal, we introduce a fitness function that measures the classification performance ratio of output to input. The constructed features are then input for various classifiers to predict the two-year survival of NSCLC patients from two public CT datasets. Our approach is compared against two popular feature selection methods in radiomics to choose relevant radiomic features, and two GP-based feature construction methods whose fitness functions are based on measuring the constructed features’ quality. The experimental results show that survival prediction models trained on GP-based constructed features outperform feature selection methods. Also, maximizing the classification performance gain output-to-input ratio produces features with higher discriminative power than only maximizing the classification accuracy from constructed features. Furthermore, a survival analysis demonstrated statistically significant differences between survival and non-survival groups in the Kaplan–Meier curves. Therefore, the proposed approach can be used as a complementary method for oncologists in determining the clinical management of NSCLC patients.
ISSN:2076-3417
2076-3417
DOI:10.3390/app14166923