Learning the pattern-based CRF for prediction of a protein local structure
Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistic...
Saved in:
Published in | Informatica (Ljubljana) Vol. 46; no. 6; pp. 135 - 141 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Ljubljana
Slovenian Society Informatika / Slovensko drustvo Informatika
01.05.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistical physics and information theory. According to classical results of Anfinsen, protein conformational structure is fully determined by its primary structure, i.e., amino acid sequence, and energy landscape theory says that the native state of a protein corresponds to the minimum of its free energy [2]. There are two dominating approaches to protein structure prediction, the first is based on minimizing physics-based free energies with some unknown parameters, and the second is a knowledge-based approach that does not necessarily use the notion of free energy and aims only to yield high prediction accuracy [14]. In comparison to these two approaches, there is a deficit in intermediate approaches where the goal is to find such knowledge-based parameterizations of free energy that would approximate real free energy for certain protein families and have a high accuracy of prediction comparable with pure knowledge-based approaches. According to M. Gromov, if energy landscape theory is true, then "probably, free energy can be encoded with a reasonable accuracy by something like 104 - 106 bits of information", and the main mathematical problem here is the lack of "general mathematical "parameter fitting" method(s), which, when applied to proteins, could provide (an effective version of) the total inter-residue interaction energies" [10]. In this paper, we introduce a probabilistic model based on a certain parametrization of free energy that we expect could be fruitful both for predicting protein dihedral angles and investigating the structure of the energy landscape. This model is based on the idea that free energy is largely determined by pairwise interactions of amino acids that are located near each other on a protein sequence. Though this approach is far from reality for general proteins, we expect it to approximate an all-alpha protein's energy landscape. |
---|---|
AbstractList | Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistical physics and information theory. According to classical results of Anfinsen, protein conformational structure is fully determined by its primary structure, i.e., amino acid sequence, and energy landscape theory says that the native state of a protein corresponds to the minimum of its free energy [2]. There are two dominating approaches to protein structure prediction, the first is based on minimizing physics-based free energies with some unknown parameters, and the second is a knowledge-based approach that does not necessarily use the notion of free energy and aims only to yield high prediction accuracy [14]. In comparison to these two approaches, there is a deficit in intermediate approaches where the goal is to find such knowledge-based parameterizations of free energy that would approximate real free energy for certain protein families and have a high accuracy of prediction comparable with pure knowledge-based approaches. According to M. Gromov, if energy landscape theory is true, then "probably, free energy can be encoded with a reasonable accuracy by something like 104 - 106 bits of information", and the main mathematical problem here is the lack of "general mathematical "parameter fitting" method(s), which, when applied to proteins, could provide (an effective version of) the total inter-residue interaction energies" [10]. In this paper, we introduce a probabilistic model based on a certain parametrization of free energy that we expect could be fruitful both for predicting protein dihedral angles and investigating the structure of the energy landscape. This model is based on the idea that free energy is largely determined by pairwise interactions of amino acids that are located near each other on a protein sequence. Though this approach is far from reality for general proteins, we expect it to approximate an all-alpha protein's energy landscape. |
Author | Mukanov, Zhalgas Takhanov, Rustem |
Author_xml | – sequence: 1 givenname: Zhalgas surname: Mukanov fullname: Mukanov, Zhalgas – sequence: 2 givenname: Rustem surname: Takhanov fullname: Takhanov, Rustem |
BookMark | eNotkEtLAzEAhINUsK2evQY8b5tk89gcpVgfLAii55DNQ1NqUpOs4L93bT0NA8PM8C3ALKboALjGaNViSuU6RL_6pjzwVSs6cQbmuGO0aTuBZ2COWoYaxiS_AItSdgjRFndkDp56p3MM8R3WDwcPulaXYzPo4izcvGyhTxkesrPB1JAiTB7qyafqQoT7ZPQelppHU8fsLsG51_virv51Cd62d6-bh6Z_vn_c3PaNwVLWxvLBEosFcsQPVkgmCWJ06JAwgzTCUjlMXzXnnGhrueQYE2eMx7LDfPCuXYKbU-_042t0papdGnOcJhURmDMpKZFTan1KmZxKyc6rQw6fOv8ojNQRmJqAqSMw9Qes_QWD9WEJ |
CitedBy_id | crossref_primary_10_1007_s00224_023_10128_w |
ContentType | Journal Article |
Copyright | 2022. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2022. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 3V. 7SC 7XB 8AL 8FD 8FE 8FG 8FK ABUWG AFKRA ARAPS AZQEC BENPR BGLVJ BYOGL CCPQU DWQXO GNUQQ HCIFZ JQ2 K7- L7M L~C L~D M0N P5Z P62 PIMPY PQEST PQQKQ PQUKI PRINS Q9U |
DOI | 10.31449/inf.v46i6.3787 |
DatabaseName | CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ProQuest Central (purchase pre-March 2016) Computing Database (Alumni Edition) Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Database (1962 - current) ProQuest Central Essentials AUTh Library subscriptions: ProQuest Central Technology Collection East Europe, Central Europe Database ProQuest One Community College ProQuest Central ProQuest Central Student SciTech Premium Collection (Proquest) (PQ_SDU_P3) ProQuest Computer Science Collection Computer Science Database Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Computing Database ProQuest Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection Publicly Available Content Database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic |
DatabaseTitle | CrossRef Publicly Available Content Database Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest Central Korea Advanced Technologies Database with Aerospace Advanced Technologies & Aerospace Collection ProQuest Computing ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition East Europe, Central Europe Database ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Academic ProQuest Central (Alumni) |
DatabaseTitleList | Publicly Available Content Database |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1854-3871 |
EndPage | 141 |
ExternalDocumentID | 10_31449_inf_v46i6_3787 |
GroupedDBID | .4S .DC 29I 2WC 3V. 5GY 8FE 8FG AAKPC AAYXX ABUWG AFKRA ALMA_UNASSIGNED_HOLDINGS ARAPS ARCSS AZQEC BENPR BGLVJ BPHCQ BYOGL CCPQU CITATION DWQXO E3Z EDO EN8 GNUQQ HCIFZ I-F IAO ICD IEA IOF K6V K7- M0N MK~ ML~ OK1 P62 PIMPY PQQKQ PROAC PV9 RNS RZL TR2 TUS 7SC 7XB 8AL 8FD 8FK JQ2 L7M L~C L~D PQEST PQUKI PRINS Q9U |
ID | FETCH-LOGICAL-c199t-d6bd2d170e2fbd79592054b807cb9c7d49b350a6662add696112eccf19816bfe3 |
IEDL.DBID | 8FG |
ISSN | 0350-5596 |
IngestDate | Thu Oct 10 19:01:13 EDT 2024 Thu Sep 26 17:04:03 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 6 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c199t-d6bd2d170e2fbd79592054b807cb9c7d49b350a6662add696112eccf19816bfe3 |
OpenAccessLink | https://www.proquest.com/docview/2716599429?pq-origsite=%requestingapplication% |
PQID | 2716599429 |
PQPubID | 1616336 |
PageCount | 7 |
ParticipantIDs | proquest_journals_2716599429 crossref_primary_10_31449_inf_v46i6_3787 |
PublicationCentury | 2000 |
PublicationDate | 2022-05-01 |
PublicationDateYYYYMMDD | 2022-05-01 |
PublicationDate_xml | – month: 05 year: 2022 text: 2022-05-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Ljubljana |
PublicationPlace_xml | – name: Ljubljana |
PublicationTitle | Informatica (Ljubljana) |
PublicationYear | 2022 |
Publisher | Slovenian Society Informatika / Slovensko drustvo Informatika |
Publisher_xml | – name: Slovenian Society Informatika / Slovensko drustvo Informatika |
SSID | ssj0043182 |
Score | 2.30788 |
Snippet | Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is... |
SourceID | proquest crossref |
SourceType | Aggregation Database |
StartPage | 135 |
SubjectTerms | Accuracy Algorithms Amino acids Energy Free energy Information theory Labeling Machine learning Parameterization Parameters Probabilistic models Proteins Statistical analysis Statistical physics |
Title | Learning the pattern-based CRF for prediction of a protein local structure |
URI | https://www.proquest.com/docview/2716599429 |
Volume | 46 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwELagXVh4Iwql8sDA4jZOUj8mBKihqqCqCpW6RXHsIJaktIXfz9lxBCxMVpQp38nffffIHULXdmYWDbKM0MhoEmdKEhkUgvCwkDpQAk6b73iesvEiniyHS59w2_i2yoYTHVHrKrc58kEIwn4oJdDn7eqD2K1RtrrqV2jsojYNObfBl0geGyYG3yjqKsIwIKCcWT3aJ4IYQg7Afv2vmL2zfsRtP91vr_SXlJ2nSQ7RvpeI-K626RHaMeUxOmjWL2B_G0_QxM9GfcMg4vDMDcosiXVLGj_MEwxyFM_WthBjwcdVgTN4rux6S_xkXRh-ccNjP9fmFC2S0evDmPjVCCSnUm6JZkqHmvLAhIXSdl94CNpLiYDnSuZcx1LBR2cQm4RAYEwykFVgrIJKQZkqTHSGWmVVmnOEeUZzZa9ywaI4j5mAk0oeCG50nAnWQTcNNOmqnoCRQuTgUEwBxdShmFoUO6jbQJf6q7BJfwx38f_rS7QX2n8LXDdhF7UAAnMFHn-res6sPdS-H01n828T26rT |
link.rule.ids | 315,783,787,12777,21400,27936,27937,33385,33756,43612,43817,74363,74630 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwELWgDLDwjSgU8MDA4hInrj8mhCpCKW1VQSt1i-LYQSxNaQu_n3PiCFiYrCiT38n33t3Zdwhdu55ZNEhTQiNrCEu1IirIJRFhrkygJawu3zEc8d6U9WedmU-4rfy1ytonlo7aFJnLkd-GIOw7SoH7vFt8EDc1ylVX_QiNTbTFIuBq91I8fqw9MXCjrKoInYCAcuZVa58IYgh1C_ZrfzH-ztuRcPfpfrPSX6dcMk28j3a9RMT3lU0P0IadH6K9evwC9qfxCPV9b9Q3DCIOj8tGmXPiaMng7kuMQY7i8dIVYhz4uMhxCt-FG2-JB47C8GvZPPZzaY_RNH6YdHvEj0YgGVVqTQzXJjRUBDbMtXHzwkPQXloGItMqE4YpDZtOITYJwYFxxUFWgbFyqiTlOrfRCWrMi7k9RVikNNPuKOc8YhnjElaqRCCFNSyVvIluamiSRdUBI4HIoUQxARSTEsXEodhErRq6xB-FVfJjuLP_f1-h7d5kOEgGT6Pnc7QTuncG5c3CFmoAHPYC2H-tL0sTfwPGYqwb |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwELWgSIiFb0ShgAcGFrdxkvpjQqgQSilVBVTqFsWxjViS0hZ-P-fUEbAwWVGmvJe7e2ef7xC6dD2zaJBlhEZGkzhTksjACsJDK3WgBKxuv-NpxPqTeDDtTn3908KXVdY-sXLUuszdHnknBGHflRLcZ8f6sojxbXI9-yBugpQ7afXjNNbRBkRF5v55kdzXXhnipFidKHQDAiqardr8RJBPyA5w2f6K2TtrR9zV1v2OUH8ddBV1kl207eUivlnxu4fWTLGPdupRDNhb5gEa-D6pbxgEHR5XTTML4kKUxr3nBIM0xeO5O5RxRODS4gyeSzfqEg9dOMMvVSPZz7k5RJPk7rXXJ35MAsmplEuimdKhpjwwoVXazQ4PQYcpEfBcyZzrWCr46AzylBCcGZMMJBYQZ6kUlClroiPUKMrCHCPMM5orZ9aWRXEeMwErlTwQ3Og4E6yJrmpo0tmqG0YKWUSFYgoophWKqUOxiVo1dKk3i0X6Q-LJ_68v0Cawmw4fRo-naCt0Vw6qIsMWagAa5gyEwFKdVwx_A7HLsFM |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+the+Pattern-based+CRF+for+Prediction+of+a+Protein+Local+Structure&rft.jtitle=Informatica+%28Ljubljana%29&rft.au=Mukanov%2C+Zhalgas&rft.au=Takhanov%2C+Rustem&rft.date=2022-05-01&rft.pub=Slovenian+Society+Informatika+%2F+Slovensko+drustvo+Informatika&rft.issn=0350-5596&rft.eissn=1854-3871&rft.volume=46&rft.issue=6&rft.spage=135&rft.epage=141&rft_id=info:doi/10.31449%2Finf.v46i6.3787&rft.externalDBID=HAS_PDF_LINK |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0350-5596&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0350-5596&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0350-5596&client=summon |