An Integrated Data Analysis Using Bioinformatics and Random Forest to Predict Prognosis of Patients With Squamous Cell Lung Cancer

Lung cancer is the leading cause of cancer death worldwide, regardless of gender. Among the types of lung cancer, Lung Squamous Cell Carcinoma (LUSC) is the second most common type, characterized by a diagnosis in advanced stages, a poor prognosis, and a high association with smoking. Due to the sev...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 12; pp. 59335 - 59345
Main Authors Lima, Debora V. C., Terrematte, Patrick, Stransky, Beatriz, Neto, Adriao D. D.
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Lung cancer is the leading cause of cancer death worldwide, regardless of gender. Among the types of lung cancer, Lung Squamous Cell Carcinoma (LUSC) is the second most common type, characterized by a diagnosis in advanced stages, a poor prognosis, and a high association with smoking. Due to the severity of lung cancer, it is essential to understand its molecular mechanisms. In this context, this study uses transcriptomic and clinical data to implement bioinformatics pipelines, and machine learning, through random forest models to predict patients' overall survival and obtain a gene signature of LUSC for tumor progression. We analyzed clinical and molecular data from the project LUSC-TCGA, and we performed differential expression analyses (DEA) comparing normal tissues against tumor tissues. Based on DEA-selected genes, the patients were divided into three clusters, followed by a feature selection and classification. Finally, it was possible to obtain classifications results close to 70% of accuracy for the three clusters. Finally, we also performed a functional enrichment analysis. The clustering analysis revealed in cluster 2, enriched genes such as CDT1, CENPI, and NLGN1, associated with the molecular EMT (epithelial-to-mesenchymal transition) process. Our approach facilitated the identification of genes that are biologically relevant to the LUSC development process, holding significant genes for predicting patient survival, such as gene ALDH3B1, C7, FAM83A, FOSB, GCGR, BMP7, PPP1R27 and AQP1, and putative therapeutic targets for LUSC such as gene FAM83A, CAV1, TNS4, EIF4G1, TFAP2A, GCGR and PPP1R27.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2024.3392277