Linearity of network proximity measures: implications for set-based queries and significance testing

In recent years, various network proximity measures have been proposed to facilitate the use of biomolecular interaction data in a broad range of applications. These applications include functional annotation, disease gene prioritization, comparative analysis of biological systems and prediction of...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 33; no. 9; pp. 1354 - 1361
Main Authors Maxwell, Sean, Chance, Mark R, Koyutürk, Mehmet
Format Journal Article
LanguageEnglish
Published England Oxford University Press 01.05.2017
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent years, various network proximity measures have been proposed to facilitate the use of biomolecular interaction data in a broad range of applications. These applications include functional annotation, disease gene prioritization, comparative analysis of biological systems and prediction of new interactions. In such applications, a major task is the scoring or ranking of the nodes in the network in terms of their proximity to a given set of 'seed' nodes (e.g. a group of proteins that are identified to be associated with a disease, or are deferentially expressed in a certain condition). Many different network proximity measures are utilized for this purpose, and these measures are quite diverse in terms of the benefits they offer. We propose a unifying framework for characterizing network proximity measures for set-based queries. We observe that many existing measures are linear, in that the proximity of a node to a set of nodes can be represented as an aggregation of its proximity to the individual nodes in the set. Based on this observation, we propose methods for processing of set-based proximity queries that take advantage of sparse local proximity information. In addition, we provide an analytical framework for characterizing the distribution of proximity scores based on reference models that accurately capture the characteristics of the seed set (e.g. degree distribution and biological function). The resulting framework facilitates computation of exact figures for the statistical significance of network proximity scores, enabling assessment of the accuracy of Monte Carlo simulation based estimation methods. Implementations of the methods in this paper are available at https://bioengine.case.edu/crosstalker which includes a robust visualization for results viewing. stm@case.edu or mxk331@case.edu. Supplementary data are available at Bioinformatics online.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btw733