Classification of domains in predicted structures of the human proteome

Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and c...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the National Academy of Sciences - PNAS Vol. 120; no. 12; p. e2214069120
Main Authors Schaeffer, R Dustin, Zhang, Jing, Kinch, Lisa N, Pei, Jimin, Cong, Qian, Grishin, Nick V
Format Journal Article
LanguageEnglish
Published United States National Academy of Sciences 21.03.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recent advances in protein structure prediction have generated accurate structures of previously uncharacterized human proteins. Identifying domains in these predicted structures and classifying them into an evolutionary hierarchy can reveal biological insights. Here, we describe the detection and classification of domains from the human proteome. Our classification indicates that only 62% of residues are located in globular domains. We further classify these globular domains and observe that the majority (65%) can be classified among known folds by sequence, with a smaller fraction (33%) requiring structural data to refine the domain boundaries and/or to support their homology. A relatively small number (966 domains) cannot be confidently assigned using our automatic pipelines, thus demanding manual inspection. We classify 47,576 domains, of which only 23% have been included in experimental structures. A portion (6.3%) of these classified globular domains lack sequence-based annotation in InterPro. A quarter (23%) have not been structurally modeled by homology, and they contain 2,540 known disease-causing single amino acid variations whose pathogenesis can now be inferred using AF models. A comparison of classified domains from a series of model organisms revealed expansions of several immune response-related domains in humans and a depletion of olfactory receptors. Finally, we use this classification to expand well-known protein families of biological significance. These classifications are presented on the ECOD website (http://prodata.swmed.edu/ecod/index_human.php).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
1R.D.S., J.Z., Q.C. and N.V.G. contributed equally to this work.
Edited by Nicholas Polizzi, Harvard Medical School, Boston, MA 02215; received August 16, 2022; accepted February 6, 2023 by Editorial Board Member William F. DeGrado
ISSN:0027-8424
1091-6490
1091-6490
DOI:10.1073/pnas.2214069120