RNA sequencing-based histological subtyping of non-small cell lung cancer with generative adversarial data imputation
Non small cell lung cancer (NSCLC) is the most common type of lung cancer and is classified into two main histological subtypes: adenocarcinoma and squamous cell carcinoma. The identification of the histological subtype is a crucial step in the diagnosis of NSCLC. RNA sequencing data hold valuable b...
Saved in:
Published in | 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI) pp. 1 - 4 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
15.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Non small cell lung cancer (NSCLC) is the most common type of lung cancer and is classified into two main histological subtypes: adenocarcinoma and squamous cell carcinoma. The identification of the histological subtype is a crucial step in the diagnosis of NSCLC. RNA sequencing data hold valuable biological information but may contain missing gene expression counts, which limit their potential exploitation in practice. In this work, we address the issue of missing gene expression data in NSCLC histological subtype prediction from RNA sequencing. To this end, we propose a pipeline based on the generative adversarial imputation network (GAIN) for the generation of plausible imputations of missing data and tree-based ensemble models for NSCLC histological subtype prediction. We adopted a nested cross validation scheme for the evaluation of the classification models. The proposed pipeline exhibited an outstanding performance with an area under the receiver operating characteristic curve of 0.98 ± 0.03 and an accuracy of 0.96 ± 0.05 obtained with the Light Gradient Boosting Machine. Experimental results showed that GAIN-derived imputations are useful to boost classification performance. Finally, we used the Shapley Additive Explanations technique and found a set of genes that were the most relevant for NSCLC subtyping across different models. |
---|---|
ISSN: | 2641-3604 |
DOI: | 10.1109/BHI58575.2023.10313485 |