Identification of gene pairs through penalized regression subject to constraints

Background This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. Howev...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 18; no. 1; pp. 466 - 11
Main Authors	Shen, Rex, Luo, Lan, Jiang, Hui
Format	Journal Article
Language	English
Published	London BioMed Central 03.11.2017 BioMed Central Ltd BMC
Subjects	ADMM Algorithms Analysis Bioinformatics Biomarker Biomarkers, Tumor - genetics Biomarkers, Tumor - metabolism Biomedical and Life Sciences Computational biology Computational Biology/Bioinformatics Computer Appl. in Life Sciences Gene pair Genetic aspects Genotype Humans Identification and classification Keratin-15 - genetics Keratin-15 - metabolism Life Sciences Linear Models Male Methodology Methodology Article Methods Microarrays Penalized regression Phenotype Prostate cancer Prostate-Specific Antigen - genetics Prostate-Specific Antigen - metabolism Prostatic Neoplasms - diagnosis Prostatic Neoplasms - genetics Prostatic Neoplasms - metabolism Real-Time Polymerase Chain Reaction RNA, Neoplasm - genetics RNA, Neoplasm - metabolism Transcriptome analysis Biomarker Gene pair Penalized regression ADMM
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-017-1872-9

Cover

More Information
Summary:	Background This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. However, given a small number of biological samples yet a large number of genes, this problem suffers from the difficulty of high computational complexity and imposes challenges to the accuracy of identification statistically. Results In this paper, we propose a parsimonious model representation and develop efficient algorithms for identification. Particularly, we derive an equivalent model subject to a sum-to-zero constraint in penalized linear regression, where the correspondence between nonzero coefficients in these models is established. Most importantly, it reduces the model complexity of the traditional approach from the quadratic order to the linear order in the number of candidate genes, while overcoming the difficulty of model nonidentifiablity. Computationally, we develop an algorithm using the alternating direction method of multipliers (ADMM) to deal with the constraint. Numerically, we demonstrate that the proposed method outperforms the traditional method in terms of the statistical accuracy. Moreover, we demonstrate that our ADMM algorithm is more computationally efficient than a coordinate descent algorithm with a local search. Finally, we illustrate the proposed method on a prostate cancer dataset to identify gene pairs that are associated with pre-operative prostate-specific antigen. Conclusion Our findings demonstrate the feasibility and utility of using gene pairs as biomarkers.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-017-1872-9