Highly accurate protein structure prediction for the human proteome

Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experi...

Full description

Saved in:
Bibliographic Details
Published inNature (London) Vol. 596; no. 7873; pp. 590 - 596
Main Authors Tunyasuvunakool, Kathryn, Adler, Jonas, Wu, Zachary, Green, Tim, Zielinski, Michal, Žídek, Augustin, Bridgland, Alex, Cowie, Andrew, Meyer, Clemens, Laydon, Agata, Velankar, Sameer, Kleywegt, Gerard J., Bateman, Alex, Evans, Richard, Pritzel, Alexander, Figurnov, Michael, Ronneberger, Olaf, Bates, Russ, Kohl, Simon A. A., Potapenko, Anna, Ballard, Andrew J., Romera-Paredes, Bernardino, Nikolov, Stanislav, Jain, Rishub, Clancy, Ellen, Reiman, David, Petersen, Stig, Senior, Andrew W., Kavukcuoglu, Koray, Birney, Ewan, Kohli, Pushmeet, Jumper, John, Hassabis, Demis
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 26.08.2021
Nature Publishing Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Protein structures can provide invaluable information, both for reasoning about biological processes and for enabling interventions such as structure-based drug development or targeted mutagenesis. After decades of effort, 17% of the total residues in human protein sequences are covered by an experimentally determined structure 1 . Here we markedly expand the structural coverage of the proteome by applying the state-of-the-art machine learning method, AlphaFold 2 , at a scale that covers almost the entire human proteome (98.5% of human proteins). The resulting dataset covers 58% of residues with a confident prediction, of which a subset (36% of all residues) have very high confidence. We introduce several metrics developed by building on the AlphaFold model and use them to interpret the dataset, identifying strong multi-domain predictions as well as regions that are likely to be disordered. Finally, we provide some case studies to illustrate how high-quality predictions could be used to generate biological hypotheses. We are making our predictions freely available to the community and anticipate that routine large-scale and high-accuracy structure prediction will become an important tool that will allow new questions to be addressed from a structural perspective. AlphaFold is used to predict the structures of almost all of the proteins in the human proteome—the availability of high-confidence predicted structures could enable new avenues of investigation from a structural perspective.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0028-0836
1476-4687
1476-4687
DOI:10.1038/s41586-021-03828-1