Protein Contact Prediction by Integrating Joint Evolutionary Coupling Analysis and Supervised Learning
Protein contacts contain important information for protein structure and functional study, but contact prediction from sequence remains very challenging. Both evolutionary coupling (EC) analysis and supervised machine learning methods are developed to predict contacts, making use of different types...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
10.12.2013
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.1312.2988 |
Cover
Summary: | Protein contacts contain important information for protein structure and
functional study, but contact prediction from sequence remains very
challenging. Both evolutionary coupling (EC) analysis and supervised machine
learning methods are developed to predict contacts, making use of different
types of information, respectively. This paper presents a group graphical lasso
(GGL) method for contact prediction that integrates joint multi-family EC
analysis and supervised learning. Different from existing single-family EC
analysis that uses residue co-evolution information in only the target protein
family, our joint EC analysis uses residue co-evolution in both the target
family and its related families, which may have divergent sequences but similar
folds. To implement joint EC analysis, we model a set of related protein
families using Gaussian graphical models (GGM) and then co-estimate their
precision matrices by maximum-likelihood, subject to the constraint that the
precision matrices shall share similar residue co-evolution patterns. To
further improve the accuracy of the estimated precision matrices, we employ a
supervised learning method to predict contact probability from a variety of
evolutionary and non-evolutionary information and then incorporate the
predicted probability as prior into our GGL framework. Experiments show that
our method can predict contacts much more accurately than existing methods, and
that our method performs better on both conserved and family-specific contacts. |
---|---|
DOI: | 10.48550/arxiv.1312.2988 |