Benchmarking network propagation methods for disease gene identification

In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 var...

Full description

Saved in:

Bibliographic Details
Published in	PLoS computational biology Vol. 15; no. 9; p. e1007276
Main Authors	Picart-Armada, Sergio, Barrett, Steven J., Willé, David R., Perera-Lluna, Alexandre, Gutteridge, Alex, Dessailly, Benoit H.
Format	Journal Article Publication
Language	English
Published	United States Public Library of Science 01.09.2019 Public Library of Science (PLoS)
Subjects	Algorithms Artificial intelligence Benchmarking Bioengineering Bioinformatics Biologia molecular Biology and Life Sciences Biomedical materials Biomedical research Business performance management Ciències de la salut Computational Biology - methods Computer and Information Sciences Computer Simulation Databases, Genetic Design factors Diffusion Disease Disease - genetics Diseases Drug discovery Drug Discovery - methods Drug therapy Generalized linear models Genes Genetic aspects Genetic research Genetics Genomes Gens Humans Identification Identification and classification Identification methods Information systems Learning algorithms Machine Learning Malalties Medicine and Health Sciences Methods Mètodes estadístics Networks Performance measurement Pharmaceutical industry Physical Sciences Propagation Protein seeding Proteins R&D Research & development Research and Analysis Methods Software Spain Supervision Target recognition Àrees temàtiques de la UPC Spain United Kingdom > UK
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes.
Bibliography:	new_version ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 I have read the journal’s policy and the authors of this manuscript have the following competing interests: SJB, DRW, AG, and BHD are paid employees and shareholders of GlaxoSmithKline. The commercial affiliation of SJB, DRW, AG, and BHD does not alter our adherence to PLOS policies.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1007276