Mapping the glycosyltransferase fold landscape using interpretable deep learning

Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mappin...

Full description

Saved in:

Bibliographic Details
Published in	Nature communications Vol. 12; no. 1; p. 5656
Main Authors	Taujale, Rahil, Zhou, Zhongliang, Yeung, Wayland, Moremen, Kelley W., Li, Sheng, Kannan, Natarajan
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 27.09.2021 Nature Publishing Group Nature Portfolio
Subjects	631/114/1305 631/114/2397 631/114/2411 631/114/663/2009 631/114/794 Alignment Amino acid sequence Amino Acid Sequence - genetics Artificial neural networks Bioinformatics Biosynthesis Carbohydrates Classification Computational Biology - methods Databases, Genetic Datasets as Topic Deep Learning Divergence Glycosylation Glycosyltransferase Glycosyltransferases - genetics Glycosyltransferases - metabolism Humanities and Social Sciences Machine learning Mapping Model accuracy multidisciplinary Neural networks Nucleotide sequence Protein Folding Protein structure Protein Structure, Secondary - genetics Protein Structure, Tertiary - genetics Proteins Science Science (multidisciplinary) Secondary structure Sequence Alignment Structure-function relationships Substrates
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies. Glycosyltransferases (GT) are proteins that display extensive sequence and functional variation on a subset of 3D folds. Here, the authors use interpretable deep learning to predict 3D folds from sequence without the need for sequence alignment, which also enables the prediction of GTs with new folds.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2041-1723 2041-1723
DOI:	10.1038/s41467-021-25975-9