Overlapping group screening for binary cancer classification with TCGA high-dimensional genomic data
Precision medicine has been a global trend of medical development, wherein cancer diagnosis plays an important role. With accurate diagnosis of cancer, we can provide patients with appropriate medical treatments for improving patients' survival. Since disease developments involve complex interp...
Saved in:
Published in | Journal of bioinformatics and computational biology Vol. 21; no. 3; p. 2350013 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Singapore
01.06.2023
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Summary: | Precision medicine has been a global trend of medical development, wherein cancer diagnosis plays an important role. With accurate diagnosis of cancer, we can provide patients with appropriate medical treatments for improving patients' survival. Since disease developments involve complex interplay among multiple factors such as gene-gene interactions, cancer classifications based on microarray gene expression profiling data are expected to be effective, and hence, have attracted extensive attention in computational biology and medicine. However, when using genomic data to build a diagnostic model, there exist several problems to be overcome, including the high-dimensional feature space and feature contamination. In this paper, we propose using the overlapping group screening (OGS) approach to build an accurate cancer diagnosis model and predict the probability of a patient falling into some disease classification category in the logistic regression framework. This new proposal integrates gene pathway information into the procedure for identifying genes and gene-gene interactions associated with the classification of cancer outcome groups. We conduct a series of simulation studies to compare the predictive accuracy of our proposed method for cancer diagnosis with some existing machine learning methods, and find the better performances of the former method. We apply the proposed method to the genomic data of The Cancer Genome Atlas related to lung adenocarcinoma (LUAD), liver hepatocellular carcinoma (LHC), and thyroid carcinoma (THCA), to establish accurate cancer diagnosis models. |
---|---|
ISSN: | 1757-6334 |
DOI: | 10.1142/S0219720023500130 |