Computing Molecular Signatures as Optima of a Bi-Objective Function: Method and Application to Prediction in Oncogenomics

Background Filter feature selection methods compute molecular signatures by selecting subsets of genes in the ranking of a valuation function. The motivations of the valuation functions choice are almost always clearly stated, but those for selecting the genes according to their ranking are hardly e...

Full description

Saved in:

Bibliographic Details
Published in	Cancer informatics Vol. 2015; no. 14; pp. 33 - 45
Main Authors	Gardeux, Vincent, Chelouah, Rachid, Wanderley, Maria F. Barbosa, Siarry, Patrick, Braga, Antônio P., Reyal, Fabien, Rouzier, Roman, Pusztai, Lajos, Natowicz, René
Format	Journal Article
Language	English
Published	London, England SAGE Publishing 01.01.2015 SAGE Publications Sage Publications Ltd Libertas Academica
Subjects	Bioinformatics Computer Science Methodology bi-objective optimization breast cancer molecular signatures filter method feature selection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Background Filter feature selection methods compute molecular signatures by selecting subsets of genes in the ranking of a valuation function. The motivations of the valuation functions choice are almost always clearly stated, but those for selecting the genes according to their ranking are hardly ever explicit. Method We addressed the computation of molecular signatures by searching the optima of a bi-objective function whose solution space was the set of all possible molecular signatures, ie, the set of subsets of genes. The two objectives were the size of the signature-to be minimized–and the interclass distance induced by the signature-to be maximized–. Results We showed that: 1) the convex combination of the two objectives had exactly n optimal non empty signatures where n was the number of genes, 2) the n optimal signatures were nested, and 3) the optimal signature of size k was the subset of k top ranked genes that contributed the most to the interclass distance. We applied our feature selection method on five public datasets in oncology, and assessed the prediction performances of the optimal signatures as input to the diagonal linear discriminant analysis (DLDA) classifier. They were at the same level or better than the best-reported ones. The predictions were robust, and the signatures were almost always significantly smaller. We studied in more details the performances of our predictive modeling on two breast cancer datasets to predict the response to a preoperative chemotherapy: the performances were higher than the previously reported ones, the signatures were three times smaller (11 versus 30 gene signatures), and the genes member of the signature were known to be involved in the response to chemotherapy. Conclusions Defining molecular signatures as the optima of a bi-objective function that combined the signature size and the interclass distance was well founded and efficient for prediction in oncogenomics. The complexity of the computation was very low because the optimal signatures were the sets of genes in the ranking of their valuation. Software can be freely downloaded from http://gardeux-vincent.eu/DeltaRanking.php
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1176-9351 1176-9351
DOI:	10.4137/CIN.S21111