Addition of Pathway-Based Information to Improve Predictions in Transcriptomics

The diagnosis and prognosis of cancer are among the more critical challenges that modern medicine confronts. In this sense, personalized medicine aims to use data from heterogeneous sources to estimate the evolution of the disease for each specific patient in order to fit the more appropriate treatm...

Full description

Saved in:

Bibliographic Details
Published in	Bioinformatics and Biomedical Engineering pp. 200 - 208
Main Authors	Urda, Daniel, Veredas, Francisco J., Turias, Ignacio, Franco, Leonardo
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Machine learning Next-generation sequencing Predictive modelling Problem-specific information
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The diagnosis and prognosis of cancer are among the more critical challenges that modern medicine confronts. In this sense, personalized medicine aims to use data from heterogeneous sources to estimate the evolution of the disease for each specific patient in order to fit the more appropriate treatments. In recent years, DNA sequencing data have boosted cancer prediction and treatment by supplying genetic information that has been used to design genetic signatures or biomarkers that led to a better classification of the different subtypes of cancer as well as to a better estimation of the evolution of the disease and the response to diverse treatments. Several machine learning models have been proposed in the literature for cancer prediction. However, the efficacy of these models can be seriously affected by the existing imbalance between the high dimensionality of the gene expression feature sets and the number of samples available, what is known as the curse of dimensionality. Although linear predictive models could give worse performance rates when compared to more sophisticated non-linear models, they have the main advantage of being interpretable. However, the use of domain-specific information has been proved useful to boost the performance of multivariate linear predictors in high dimensional settings. In this work, we design a set of linear predictive models that incorporate domain-specific information from genetic pathways for effective feature selection. By combining these linear model with other classical machine learning models, we get state-of-art performance rates in the prediction of vital status on a public cancer dataset.
ISBN:	3030179346 9783030179342
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-17935-9_19