Comparison of pathway and gene-level models for cancer prognosis prediction

Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expression-based predictive models involves the aggregation of gene-level data into biological pathways....

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 21; no. 1; pp. 76 - 17
Main Authors	Zheng, Xingyu, Amos, Christopher I, Frost, H Robert
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 28.02.2020 BioMed Central BMC
Subjects	Cancer Cancer genetics Cancer prognosis prediction Cohort Studies Comparative analysis Computational efficiency Computer simulation Computing costs Computing time Correlation analysis Gene Expression Gene expression data Genes Genomes Genomics Glioma - genetics Glioma - mortality Gliomas Health care reform Humans Inter-gene correlation L1 penalized regression model Medical prognosis Methodology Models, Genetic Neoplasms - genetics Neoplasms - mortality Pathway analysis Performance prediction Permutations Power (Philosophy) Power efficiency Prediction models Prognosis Proportional Hazards Models Robustness (mathematics) Statistical models Survival Pathway analysis L1 penalized regression model Gene expression data Cancer prognosis prediction Inter-gene correlation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Cancer prognosis prediction is valuable for patients and clinicians because it allows them to appropriately manage care. A promising direction for improving the performance and interpretation of expression-based predictive models involves the aggregation of gene-level data into biological pathways. While many studies have used pathway-level predictors for cancer survival analysis, a comprehensive comparison of pathway-level and gene-level prognostic models has not been performed. To address this gap, we characterized the performance of penalized Cox proportional hazard models built using either pathway- or gene-level predictors for the cancers profiled in The Cancer Genome Atlas (TCGA) and pathways from the Molecular Signatures Database (MSigDB). When analyzing TCGA data, we found that pathway-level models are more parsimonious, more robust, more computationally efficient and easier to interpret than gene-level models with similar predictive performance. For example, both pathway-level and gene-level models have an average Cox concordance index of ~ 0.85 for the TCGA glioma cohort, however, the gene-level model has twice as many predictors on average, the predictor composition is less stable across cross-validation folds and estimation takes 40 times as long as compared to the pathway-level model. When the complex correlation structure of the data is broken by permutation, the pathway-level model has greater predictive performance while still retaining superior interpretative power, robustness, parsimony and computational efficiency relative to the gene-level models. For example, the average concordance index of the pathway-level model increases to 0.88 while the gene-level model falls to 0.56 for the TCGA glioma cohort using survival times simulated from uncorrelated gene expression data. The results of this study show that when the correlations among gene expression values are low, pathway-level analyses can yield better predictive performance, greater interpretative power, more robust models and less computational cost relative to a gene-level model. When correlations among genes are high, a pathway-level analysis provides equivalent predictive power compared to a gene-level analysis while retaining the advantages of interpretability, robustness and computational efficiency.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-020-3423-z