RNA sequencing-based histological subtyping of non-small cell lung cancer with generative adversarial data imputation

Non small cell lung cancer (NSCLC) is the most common type of lung cancer and is classified into two main histological subtypes: adenocarcinoma and squamous cell carcinoma. The identification of the histological subtype is a crucial step in the diagnosis of NSCLC. RNA sequencing data hold valuable b...

Full description

Saved in:
Bibliographic Details
Published in2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) pp. 1 - 4
Main Authors Saber, Ralph, Routy, Bertrand, Turcotte, Simon, Kadoury, Samuel
Format Conference Proceeding
LanguageEnglish
Published IEEE 15.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Non small cell lung cancer (NSCLC) is the most common type of lung cancer and is classified into two main histological subtypes: adenocarcinoma and squamous cell carcinoma. The identification of the histological subtype is a crucial step in the diagnosis of NSCLC. RNA sequencing data hold valuable biological information but may contain missing gene expression counts, which limit their potential exploitation in practice. In this work, we address the issue of missing gene expression data in NSCLC histological subtype prediction from RNA sequencing. To this end, we propose a pipeline based on the generative adversarial imputation network (GAIN) for the generation of plausible imputations of missing data and tree-based ensemble models for NSCLC histological subtype prediction. We adopted a nested cross validation scheme for the evaluation of the classification models. The proposed pipeline exhibited an outstanding performance with an area under the receiver operating characteristic curve of 0.98 ± 0.03 and an accuracy of 0.96 ± 0.05 obtained with the Light Gradient Boosting Machine. Experimental results showed that GAIN-derived imputations are useful to boost classification performance. Finally, we used the Shapley Additive Explanations technique and found a set of genes that were the most relevant for NSCLC subtyping across different models.
ISSN:2641-3604
DOI:10.1109/BHI58575.2023.10313485