Mapping the glycosyltransferase fold landscape using interpretable deep learning

Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mappin...

Full description

Saved in:
Bibliographic Details
Published inNature communications Vol. 12; no. 1; p. 5656
Main Authors Taujale, Rahil, Zhou, Zhongliang, Yeung, Wayland, Moremen, Kelley W., Li, Sheng, Kannan, Natarajan
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 27.09.2021
Nature Publishing Group
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies. Glycosyltransferases (GT) are proteins that display extensive sequence and functional variation on a subset of 3D folds. Here, the authors use interpretable deep learning to predict 3D folds from sequence without the need for sequence alignment, which also enables the prediction of GTs with new folds.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2041-1723
2041-1723
DOI:10.1038/s41467-021-25975-9