Prioritization of potential candidate disease genes by topological similarity of protein–protein interaction network and phenotype data

[Display omitted] •We construct a reliable heterogeneous network by fusing multiple networks.•We devise a random walk based algorithm on the reliable heterogeneous network.•Combining topological similarity with phenotype data helps to predict causal genes.•The algorithm is still in good performance...

Full description

Saved in:
Bibliographic Details
Published inJournal of biomedical informatics Vol. 53; pp. 229 - 236
Main Authors Luo, Jiawei, Liang, Shiyu
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.02.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •We construct a reliable heterogeneous network by fusing multiple networks.•We devise a random walk based algorithm on the reliable heterogeneous network.•Combining topological similarity with phenotype data helps to predict causal genes.•The algorithm is still in good performance at low parameter values. Identifying candidate disease genes is important to improve medical care. However, this task is challenging in the post-genomic era. Several computational approaches have been proposed to prioritize potential candidate genes relying on protein–protein interaction (PPI) networks. However, the experimental PPI network is usually liable to contain a number of spurious interactions. In this paper, we construct a reliable heterogeneous network by fusing multiple networks, a PPI network reconstructed by topological similarity, a phenotype similarity network and known associations between diseases and genes. We then devise a random walk-based algorithm on the reliable heterogeneous network called RWRHN to prioritize potential candidate genes for inherited diseases. The results of leave-one-out cross-validation experiments show that the RWRHN algorithm has better performance than the RWRH and CIPHER methods in inferring disease genes. Furthermore, RWRHN is used to predict novel causal genes for 16 diseases, including breast cancer, diabetes mellitus type 2, and prostate cancer, as well as to detect disease-related protein complexes. The top predictions are supported by literature evidence.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1532-0464
1532-0480
DOI:10.1016/j.jbi.2014.11.004