Structure-based Comparative Analysis and Prediction of N-linked Glycosylation Sites in Evolutionarily Distant Eukaryotes

The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sit...

Full description

Saved in:
Bibliographic Details
Published inGenomics, proteomics & bioinformatics Vol. 11; no. 2; pp. 96 - 104
Main Authors Lam, Phuc Vinh Nguyen, Goldman, Radoslav, Karagiannis, Konstantinos, Narsule, Tejas, Simonyan, Vahan, Soika, Valerii, Mazumder, Raja
Format Journal Article
LanguageEnglish
Published China Elsevier Ltd 01.04.2013
Department of Biochemistry and Molecular Biology, George Washington University Medical Center, Washington, DC 20037, USA%Department of Oncology, Georgetown University, Washington, DC 20057, USA%Department of Biochemistry and Molecular Biology, George Washington University Medical Center, Washington, DC 20037, USA%Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, MD 20852, USA
Life Sciences Department, Paris Diderot University, Paris 75013, France
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharomyces cerevisiae. Our analysis shows that 78% of all asparagines of NXS/T motif involved in N-glycosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribution across the secondary structural elements, indicating that the NXS/T motif in itself is not biologically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://hive.biochemistry.gwu.edu/tools/sfat.
Bibliography:N-linked glycosylation;Gain and loss of glycosyla-tion;nsSNP;nsSNV;Variation
11-4926/Q
The asparagine-X-serine/threonine (NXS/T) motif, where X is any amino acid except proline, is the consensus motif for N-linked glycosylation. Significant numbers of high-resolution crystal structures of glycosylated proteins allow us to carry out structural analysis of the N-linked glycosylation sites (NGS). Our analysis shows that there is enough structural information from diverse glycoproteins to allow the development of rules which can be used to predict NGS. A Python-based tool was developed to investigate asparagines implicated in N-glycosylation in five species: Homo sapiens, Mus musculus, Drosophila melanogaster, Arabidopsis thaliana and Saccharo- myees cerevisiae. Our analysis shows that 78 % of all asparagines of NXS/T motif involved in N-gly- cosylation are localized in the loop/turn conformation in the human proteome. Similar distribution was revealed for all the other species examined. Comparative analysis of the occurrence of NXS/T motifs not known to be glycosylated and their reverse sequence (S/TXN) shows a similar distribu- tion across the secondary structural elements, indicating that the NXS/T motif in itself is not bio- logically relevant. Based on our analysis, we have defined rules to determine NGS. Using machine learning methods based on these rules we can predict with 93% accuracy if a particular site will be glycosylated. If structural information is not available the tool uses structural prediction results resulting in 74% accuracy. The tool was used to identify glycosylation sites in 108 human proteins with structures and 2247 proteins without structures that have acquired NXS/T site/s due to non-synonymous variation. The tool, Structure Feature Analysis Tool (SFAT), is freely available to the public at http://hive.biochemistry.gwu.edu/tools/sfat.
http://dx.doi.org/10.1016/j.gpb.2012.11.003
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
USDOE Office of Science (SC), Biological and Environmental Research (BER). Biological Systems Science Division
ISSN:1672-0229
2210-3244
DOI:10.1016/j.gpb.2012.11.003