Learning the local landscape of protein structures with convolutional neural networks

One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biological physics Vol. 47; no. 4; pp. 435 - 454
Main Authors	Kulikova, Anastasiya V., Diaz, Daniel J., Loy, James M., Ellington, Andrew D., Wilke, Claus O.
Format	Journal Article
Language	English
Published	Dordrecht Springer Netherlands 01.12.2021 Springer Nature B.V
Subjects	Amino Acid Sequence Amino Acids Biochemistry Biological and Medical Physics Biophysics Complex Fluids and Microfluidics Complex Systems Conserved sequence Mutation Neural networks Neural Networks, Computer Neurosciences Nucleotide sequence Original Paper Physics Physics and Astronomy Predictions Protein engineering Protein structure Proteins Proteins - genetics Soft and Granular Matter The Revolutionary Impact of Landscapes in Biology Mutation Microenvironment Protein structure Convolutional neural network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0092-0606 1573-0689
DOI:	10.1007/s10867-021-09593-6