Learning the pattern-based CRF for prediction of a protein local structure
Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistic...
Saved in:
Published in | Informatica (Ljubljana) Vol. 46; no. 6; pp. 135 - 141 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Ljubljana
Slovenian Society Informatika / Slovensko drustvo Informatika
01.05.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistical physics and information theory. According to classical results of Anfinsen, protein conformational structure is fully determined by its primary structure, i.e., amino acid sequence, and energy landscape theory says that the native state of a protein corresponds to the minimum of its free energy [2]. There are two dominating approaches to protein structure prediction, the first is based on minimizing physics-based free energies with some unknown parameters, and the second is a knowledge-based approach that does not necessarily use the notion of free energy and aims only to yield high prediction accuracy [14]. In comparison to these two approaches, there is a deficit in intermediate approaches where the goal is to find such knowledge-based parameterizations of free energy that would approximate real free energy for certain protein families and have a high accuracy of prediction comparable with pure knowledge-based approaches. According to M. Gromov, if energy landscape theory is true, then "probably, free energy can be encoded with a reasonable accuracy by something like 104 - 106 bits of information", and the main mathematical problem here is the lack of "general mathematical "parameter fitting" method(s), which, when applied to proteins, could provide (an effective version of) the total inter-residue interaction energies" [10]. In this paper, we introduce a probabilistic model based on a certain parametrization of free energy that we expect could be fruitful both for predicting protein dihedral angles and investigating the structure of the energy landscape. This model is based on the idea that free energy is largely determined by pairwise interactions of amino acids that are located near each other on a protein sequence. Though this approach is far from reality for general proteins, we expect it to approximate an all-alpha protein's energy landscape. |
---|---|
ISSN: | 0350-5596 1854-3871 |
DOI: | 10.31449/inf.v46i6.3787 |