Protein functional annotation refinement based on graph regularized ℓ1-norm PCA

•We investigate the issue of refining the imprecise and incomplete function tags.•We propose a graph regularized ‘1-norm PCA to address the refining problem.•This model incorporates error sparsity, PPI networks and function correlations.•We present an algorithm to solve the proposed model and prove...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 87; pp. 212 - 221
Main Authors	Sun, Dengdi, Liang, Huadong, Ge, Meiling, Ding, Zhuanlian, Cai, Wanting, Luo, Bin
Format	Journal Article
Language	English
Published	Elsevier B.V 01.02.2017
Subjects	Function–Function correlations Graph regularization Protein–Protein interaction networks Sparse coding Tag refinement Function–Function correlations Graph regularization Protein–Protein interaction networks Sparse coding Tag refinement
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•We investigate the issue of refining the imprecise and incomplete function tags.•We propose a graph regularized ‘1-norm PCA to address the refining problem.•This model incorporates error sparsity, PPI networks and function correlations.•We present an algorithm to solve the proposed model and prove the convergence.•Results show that our method can correct imprecise tags and fill incomplete ones. [Display omitted] As various high-throughput experimental techniques are developed to characterize biological systems at the genome scale, vast protein function data have been collected. However, these functional tags are often imprecise and/or incomplete, resulting in unsatisfactory performances in function-related experiments and applications. Therefore, the refinement of protein functional annotation has been one of the fundamental challenges in post-genomic era. In this paper, the refinement problem is formulated as a decomposition of the available functional tag matrix X with noise and missing data into latent protein features U and function representations V. Specifically, we aim to minimize the sparse error between original tag matrix X and the recovery matrix, and incorporate the protein-protein interactions and the function-function correlations to regularize two latent feature matrices U and V respectively. Finally, all these components are integrated into a convex optimization model, named graph regularized ℓ1-norm PCA, and an efficient convergence iterative procedure is proposed for the model based on augmented lagrangian multiplier method. We conduct extensive experiments on yeast database and the results indicate that our new refinement method outperforms consistently other related state-of-the-art approaches by the average precision and average F1-score.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2016.05.029