Graph identification of proteins in tomograms (GRIP‐Tomo)

In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases for refinement. We hypothesized that the topological connecti...

Full description

Saved in:

Bibliographic Details
Published in	Protein science Vol. 32; no. 1; pp. e4538 - n/a
Main Authors	George, August, Kim, Doo Nam, Moser, Trevor, Gildea, Ian T., Evans, James E., Cheung, Margaret S.
Format	Journal Article
Language	English
Published	Hoboken, USA John Wiley & Sons, Inc 01.01.2023 Wiley Subscription Services, Inc
Subjects	Centroids Data mining Defects dimensional reduction Full‐length Paper Full‐length Papers graph theory Graphs Human bias Humans Mathematical morphology multimeric structure native structure network theory Order parameters Pattern analysis Proteins Simulation Topology dimensional reduction network theory multimeric structure graph theory native structure
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this study, we present a method of pattern mining based on network theory that enables the identification of protein structures or complexes from synthetic volume densities, without the knowledge of predefined templates or human biases for refinement. We hypothesized that the topological connectivity of protein structures is invariant, and they are distinctive for the purpose of protein identification from distorted data presented in volume densities. Three‐dimensional densities of a protein or a complex from simulated tomographic volumes were transformed into mathematical graphs as observables. We systematically introduced data distortion or defects such as missing fullness of data, the tumbling effect, and the missing wedge effect into the simulated volumes, and varied the distance cutoffs in pixels to capture the varying connectivity between the density cluster centroids in the presence of defects. A similarity score between the graphs from the simulated volumes and the graphs transformed from the physical protein structures in point data was calculated by comparing their network theory order parameters including node degrees, betweenness centrality, and graph densities. By capturing the essential topological features defining the heterogeneous morphologies of a network, we were able to accurately identify proteins and homo‐multimeric complexes from 10 topologically distinctive samples without realistic noise added. Our approach empowers future developments of tomogram processing by providing pattern mining with interpretability, to enable the classification of single‐domain protein native topologies as well as distinct single‐domain proteins from multimeric complexes within noisy volumes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Review Editor: John Kuriyan Funding information U.S. Department of Energy, Grant/Award Number: DE‐AC05‐76RL01830
ISSN:	0961-8368 1469-896X 1469-896X
DOI:	10.1002/pro.4538