Computational discovery of direct associations between GO terms and protein domains

Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be respons...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 19; no. Suppl 14; pp. 413 - 66
Main Authors	Alborzi, Seyed Ziaeddin, Ritchie, David W, Devignes, Marie-Dominique
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 20.11.2018 BioMed Central BMC
Subjects	Acids Algorithms Amino Acid Sequence Analysis Annotations Area Under Curve Bioinformatics Computation Computational Biology - methods Computer applications Computer Science Data banks Databases, Protein Enzymes Gene Ontology Molecular Sequence Annotation Ontology Protein domain Protein Domains Protein function Protein structure Proteins Proteins - chemistry Structure Vector similarity Gene ontology Protein structure Vector similarity Protein domain Protein function Protein Function Protein Structure Protein Domain Gene Ontology Vector Similarity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Families of related proteins and their different functions may be described systematically using common classifications and ontologies such as Pfam and GO (Gene Ontology), for example. However, many proteins consist of multiple domains, and each domain, or some combination of domains, can be responsible for a particular molecular function. Therefore, identifying which domains should be associated with a specific function is a non-trivial task. We describe a general approach for the computational discovery of associations between different sets of annotations by formalising the problem as a bipartite graph enrichment problem in the setting of a tripartite graph. We call this approach "CODAC" (for COmputational Discovery of Direct Associations using Common Neighbours). As one application of this approach, we describe "GODomainMiner" for associating GO terms with protein domains. We used GODomainMiner to predict GO-domain associations between each of the 3 GO ontology namespaces (MF, BP, and CC) and the Pfam, CATH, and SCOP domain classifications. Overall, GODomainMiner yields average enrichments of 15-, 41- and 25-fold GO-domain associations compared to the existing GO annotations in these 3 domain classifications, respectively. These associations could potentially be used to annotate many of the protein chains in the Protein Databank and protein sequences in UniProt whose domain composition is known but which currently lack GO annotation.
Bibliography:	PMCID: PMC6245584
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-018-2380-2