biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data

High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within th...

Full description

Saved in:

Bibliographic Details
Published in	Frontiers in molecular biosciences Vol. 3; p. 26
Main Authors	Rinaudo, Philippe, Boudah, Samia, Junot, Christophe, Thévenot, Etienne A
Format	Journal Article
Language	English
Published	Switzerland Frontiers Media 21.06.2016 Frontiers Media S.A
Subjects	Biochemistry, Molecular Biology Computer Science Data Analysis, Statistics and Probability Genomics Information Retrieval Life Sciences Molecular Biosciences Physics Quantitative Methods omics data support vector machine molecular signature random forest biomarker partial least squares feature selection proteomics data mining diabetic patients Support Vector Machine wrapper approach bile Random Forest biosigner algorithm reference binary classifier Workflow4metabolomics taurochenodeoxycholic acid metabolomics Partial Least Squares transcriptomics discovery of biomarkers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	High-throughput technologies such as transcriptomics, proteomics, and metabolomics show great promise for the discovery of biomarkers for diagnosis and prognosis. Selection of the most promising candidates between the initial untargeted step and the subsequent validation phases is critical within the pipeline leading to clinical tests. Several statistical and data mining methods have been described for feature selection: in particular, wrapper approaches iteratively assess the performance of the classifier on distinct subsets of variables. Current wrappers, however, do not estimate the significance of the selected features. We therefore developed a new methodology to find the smallest feature subset which significantly contributes to the model performance, by using a combination of resampling, ranking of variable importance, significance assessment by permutation of the feature values in the test subsets, and half-interval search. We wrapped our biosigner algorithm around three reference binary classifiers (Partial Least Squares-Discriminant Analysis, Random Forest, and Support Vector Machines) which have been shown to achieve specific performances depending on the structure of the dataset. By using three real biological and clinical metabolomics and transcriptomics datasets (containing up to 7000 features), complementary signatures were obtained in a few minutes, generally providing higher prediction accuracies than the initial full model. Comparison with alternative feature selection approaches further indicated that our method provides signatures of restricted size and high stability. Finally, by using our methodology to seek metabolites discriminating type 1 from type 2 diabetic patients, several features were selected, including a fragment from the taurochenodeoxycholic bile acid. Our methodology, implemented in the biosigner R/Bioconductor package and Galaxy/Workflow4metabolomics module, should be of interest for both experimenters and statisticians to identify robust molecular signatures from large omics datasets in the process of developing new diagnostics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 PMCID: PMC4914951 Reviewed by: Tomohisa Hasunuma, Kobe University, Japan; Michal Jan Markuszewski, Medical University of Gdansk, Poland Edited by: Wolfram Weckwerth, University of Vienna, Austria This article was submitted to Metabolomics, a section of the journal Frontiers in Molecular Biosciences
ISSN:	2296-889X 2296-889X
DOI:	10.3389/fmolb.2016.00026