Modular decomposition of protein structure using community detection

As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Protein structures are known to be formed of domains; structural and functional subunits that are often repeated across sets of proteins. These domains generally form compact, glob...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Grant, William P, Ahnert, Sebastian E
Format	Paper Journal Article
Language	English
Published	Ithaca Cornell University Library, arXiv.org 18.09.2018
Subjects	Algorithms Amino acids Classification Communities Data banks Decomposition Granulation Inspection Modular structures Molecular dynamics Physics - Biological Physics Proteins Quantitative Biology - Biomolecules Substructures Topology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Protein structures are known to be formed of domains; structural and functional subunits that are often repeated across sets of proteins. These domains generally form compact, globular regions, and are therefore often easily identifiable by inspection, yet the problem of automatically fragmenting the protein into these compact substructures remains computationally challenging. Existing domain classification methods focus on finding subregions of protein structure that are conserved, rather than finding a decomposition which spans the full protein structure. However, such a decomposition would find ready application in coarse-graining molecular dynamics, analysing the protein's topology, in de novo protein design and in fitting electron microscopy maps. Here, we present a tool for performing this modular decomposition using the Infomap community detection algorithm. The protein structure is abstracted into a network in which its amino acids are the nodes, and where the edges are generated using a simple proximity test. Infomap can then be used to identify highly intra-connected regions of the protein. We perform this decomposition systematically across 4000 distinct protein structures, taken from the Protein Data Bank. The decomposition obtained correlates well with existing PFAM sequence classifications, but has the advantage of spanning the full protein, with the potential for novel domains. The coarse-grained network formed by the communities can also be used as a proxy for protein topology at the single-chain level; we demonstrate that grouping these proteins by their coarse-grained network results in a functionally significant classification.
ISSN:	2331-8422
DOI:	10.48550/arxiv.1809.06632