Learning the pattern-based CRF for prediction of a protein local structure

Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistic...

Full description

Saved in:
Bibliographic Details
Published inInformatica (Ljubljana) Vol. 46; no. 6; pp. 135 - 141
Main Authors Mukanov, Zhalgas, Takhanov, Rustem
Format Journal Article
LanguageEnglish
Published Ljubljana Slovenian Society Informatika / Slovensko drustvo Informatika 01.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistical physics and information theory. According to classical results of Anfinsen, protein conformational structure is fully determined by its primary structure, i.e., amino acid sequence, and energy landscape theory says that the native state of a protein corresponds to the minimum of its free energy [2]. There are two dominating approaches to protein structure prediction, the first is based on minimizing physics-based free energies with some unknown parameters, and the second is a knowledge-based approach that does not necessarily use the notion of free energy and aims only to yield high prediction accuracy [14]. In comparison to these two approaches, there is a deficit in intermediate approaches where the goal is to find such knowledge-based parameterizations of free energy that would approximate real free energy for certain protein families and have a high accuracy of prediction comparable with pure knowledge-based approaches. According to M. Gromov, if energy landscape theory is true, then "probably, free energy can be encoded with a reasonable accuracy by something like 104 - 106 bits of information", and the main mathematical problem here is the lack of "general mathematical "parameter fitting" method(s), which, when applied to proteins, could provide (an effective version of) the total inter-residue interaction energies" [10]. In this paper, we introduce a probabilistic model based on a certain parametrization of free energy that we expect could be fruitful both for predicting protein dihedral angles and investigating the structure of the energy landscape. This model is based on the idea that free energy is largely determined by pairwise interactions of amino acids that are located near each other on a protein sequence. Though this approach is far from reality for general proteins, we expect it to approximate an all-alpha protein's energy landscape.
AbstractList Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is considered a source of interesting problem formulations for machine learning. Here methods of supervised learning stay side by side with statistical physics and information theory. According to classical results of Anfinsen, protein conformational structure is fully determined by its primary structure, i.e., amino acid sequence, and energy landscape theory says that the native state of a protein corresponds to the minimum of its free energy [2]. There are two dominating approaches to protein structure prediction, the first is based on minimizing physics-based free energies with some unknown parameters, and the second is a knowledge-based approach that does not necessarily use the notion of free energy and aims only to yield high prediction accuracy [14]. In comparison to these two approaches, there is a deficit in intermediate approaches where the goal is to find such knowledge-based parameterizations of free energy that would approximate real free energy for certain protein families and have a high accuracy of prediction comparable with pure knowledge-based approaches. According to M. Gromov, if energy landscape theory is true, then "probably, free energy can be encoded with a reasonable accuracy by something like 104 - 106 bits of information", and the main mathematical problem here is the lack of "general mathematical "parameter fitting" method(s), which, when applied to proteins, could provide (an effective version of) the total inter-residue interaction energies" [10]. In this paper, we introduce a probabilistic model based on a certain parametrization of free energy that we expect could be fruitful both for predicting protein dihedral angles and investigating the structure of the energy landscape. This model is based on the idea that free energy is largely determined by pairwise interactions of amino acids that are located near each other on a protein sequence. Though this approach is far from reality for general proteins, we expect it to approximate an all-alpha protein's energy landscape.
Author Mukanov, Zhalgas
Takhanov, Rustem
Author_xml – sequence: 1
  givenname: Zhalgas
  surname: Mukanov
  fullname: Mukanov, Zhalgas
– sequence: 2
  givenname: Rustem
  surname: Takhanov
  fullname: Takhanov, Rustem
BookMark eNotkEtLAzEAhINUsK2evQY8b5tk89gcpVgfLAii55DNQ1NqUpOs4L93bT0NA8PM8C3ALKboALjGaNViSuU6RL_6pjzwVSs6cQbmuGO0aTuBZ2COWoYaxiS_AItSdgjRFndkDp56p3MM8R3WDwcPulaXYzPo4izcvGyhTxkesrPB1JAiTB7qyafqQoT7ZPQelppHU8fsLsG51_virv51Cd62d6-bh6Z_vn_c3PaNwVLWxvLBEosFcsQPVkgmCWJ06JAwgzTCUjlMXzXnnGhrueQYE2eMx7LDfPCuXYKbU-_042t0papdGnOcJhURmDMpKZFTan1KmZxKyc6rQw6fOv8ojNQRmJqAqSMw9Qes_QWD9WEJ
CitedBy_id crossref_primary_10_1007_s00224_023_10128_w
ContentType Journal Article
Copyright 2022. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2022. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
3V.
7SC
7XB
8AL
8FD
8FE
8FG
8FK
ABUWG
AFKRA
ARAPS
AZQEC
BENPR
BGLVJ
BYOGL
CCPQU
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
L7M
L~C
L~D
M0N
P5Z
P62
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
Q9U
DOI 10.31449/inf.v46i6.3787
DatabaseName CrossRef
ProQuest Central (Corporate)
Computer and Information Systems Abstracts
ProQuest Central (purchase pre-March 2016)
Computing Database (Alumni Edition)
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni) (purchase pre-March 2016)
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Database‎ (1962 - current)
ProQuest Central Essentials
AUTh Library subscriptions: ProQuest Central
Technology Collection
East Europe, Central Europe Database
ProQuest One Community College
ProQuest Central
ProQuest Central Student
SciTech Premium Collection (Proquest) (PQ_SDU_P3)
ProQuest Computer Science Collection
Computer Science Database
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Computing Database
ProQuest Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
Publicly Available Content Database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest Central Basic
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest Central Korea
Advanced Technologies Database with Aerospace
Advanced Technologies & Aerospace Collection
ProQuest Computing
ProQuest Central Basic
ProQuest Computing (Alumni Edition)
ProQuest One Academic Eastern Edition
East Europe, Central Europe Database
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest Central (Alumni)
DatabaseTitleList Publicly Available Content Database
Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1854-3871
EndPage 141
ExternalDocumentID 10_31449_inf_v46i6_3787
GroupedDBID .4S
.DC
29I
2WC
3V.
5GY
8FE
8FG
AAKPC
AAYXX
ABUWG
AFKRA
ALMA_UNASSIGNED_HOLDINGS
ARAPS
ARCSS
AZQEC
BENPR
BGLVJ
BPHCQ
BYOGL
CCPQU
CITATION
DWQXO
E3Z
EDO
EN8
GNUQQ
HCIFZ
I-F
IAO
ICD
IEA
IOF
K6V
K7-
M0N
MK~
ML~
OK1
P62
PIMPY
PQQKQ
PROAC
PV9
RNS
RZL
TR2
TUS
7SC
7XB
8AL
8FD
8FK
JQ2
L7M
L~C
L~D
PQEST
PQUKI
PRINS
Q9U
ID FETCH-LOGICAL-c199t-d6bd2d170e2fbd79592054b807cb9c7d49b350a6662add696112eccf19816bfe3
IEDL.DBID 8FG
ISSN 0350-5596
IngestDate Thu Oct 10 19:01:13 EDT 2024
Thu Sep 26 17:04:03 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 6
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c199t-d6bd2d170e2fbd79592054b807cb9c7d49b350a6662add696112eccf19816bfe3
OpenAccessLink https://www.proquest.com/docview/2716599429?pq-origsite=%requestingapplication%
PQID 2716599429
PQPubID 1616336
PageCount 7
ParticipantIDs proquest_journals_2716599429
crossref_primary_10_31449_inf_v46i6_3787
PublicationCentury 2000
PublicationDate 2022-05-01
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-05-01
  day: 01
PublicationDecade 2020
PublicationPlace Ljubljana
PublicationPlace_xml – name: Ljubljana
PublicationTitle Informatica (Ljubljana)
PublicationYear 2022
Publisher Slovenian Society Informatika / Slovensko drustvo Informatika
Publisher_xml – name: Slovenian Society Informatika / Slovensko drustvo Informatika
SSID ssj0043182
Score 2.30788
Snippet Prediction of protein conformation from its amino acid sequence is widely acknowledged as one of the most important computational biology problems and is...
SourceID proquest
crossref
SourceType Aggregation Database
StartPage 135
SubjectTerms Accuracy
Algorithms
Amino acids
Energy
Free energy
Information theory
Labeling
Machine learning
Parameterization
Parameters
Probabilistic models
Proteins
Statistical analysis
Statistical physics
Title Learning the pattern-based CRF for prediction of a protein local structure
URI https://www.proquest.com/docview/2716599429
Volume 46
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV07T8MwELagXVh4Iwql8sDA4jZOUj8mBKihqqCqCpW6RXHsIJaktIXfz9lxBCxMVpQp38nffffIHULXdmYWDbKM0MhoEmdKEhkUgvCwkDpQAk6b73iesvEiniyHS59w2_i2yoYTHVHrKrc58kEIwn4oJdDn7eqD2K1RtrrqV2jsojYNObfBl0geGyYG3yjqKsIwIKCcWT3aJ4IYQg7Afv2vmL2zfsRtP91vr_SXlJ2nSQ7RvpeI-K626RHaMeUxOmjWL2B_G0_QxM9GfcMg4vDMDcosiXVLGj_MEwxyFM_WthBjwcdVgTN4rux6S_xkXRh-ccNjP9fmFC2S0evDmPjVCCSnUm6JZkqHmvLAhIXSdl94CNpLiYDnSuZcx1LBR2cQm4RAYEwykFVgrIJKQZkqTHSGWmVVmnOEeUZzZa9ywaI4j5mAk0oeCG50nAnWQTcNNOmqnoCRQuTgUEwBxdShmFoUO6jbQJf6q7BJfwx38f_rS7QX2n8LXDdhF7UAAnMFHn-res6sPdS-H01n828T26rT
link.rule.ids 315,783,787,12777,21400,27936,27937,33385,33756,43612,43817,74363,74630
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwELWgDLDwjSgU8MDA4hInrj8mhCpCKW1VQSt1i-LYQSxNaQu_n3PiCFiYrCiT38n33t3Zdwhdu55ZNEhTQiNrCEu1IirIJRFhrkygJawu3zEc8d6U9WedmU-4rfy1ytonlo7aFJnLkd-GIOw7SoH7vFt8EDc1ylVX_QiNTbTFIuBq91I8fqw9MXCjrKoInYCAcuZVa58IYgh1C_ZrfzH-ztuRcPfpfrPSX6dcMk28j3a9RMT3lU0P0IadH6K9evwC9qfxCPV9b9Q3DCIOj8tGmXPiaMng7kuMQY7i8dIVYhz4uMhxCt-FG2-JB47C8GvZPPZzaY_RNH6YdHvEj0YgGVVqTQzXJjRUBDbMtXHzwkPQXloGItMqE4YpDZtOITYJwYFxxUFWgbFyqiTlOrfRCWrMi7k9RVikNNPuKOc8YhnjElaqRCCFNSyVvIluamiSRdUBI4HIoUQxARSTEsXEodhErRq6xB-FVfJjuLP_f1-h7d5kOEgGT6Pnc7QTuncG5c3CFmoAHPYC2H-tL0sTfwPGYqwb
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV09T8MwELWgSIiFb0ShgAcGFrdxkvpjQqgQSilVBVTqFsWxjViS0hZ-P-fUEbAwWVGmvJe7e2ef7xC6dD2zaJBlhEZGkzhTksjACsJDK3WgBKxuv-NpxPqTeDDtTn3908KXVdY-sXLUuszdHnknBGHflRLcZ8f6sojxbXI9-yBugpQ7afXjNNbRBkRF5v55kdzXXhnipFidKHQDAiqardr8RJBPyA5w2f6K2TtrR9zV1v2OUH8ddBV1kl207eUivlnxu4fWTLGPdupRDNhb5gEa-D6pbxgEHR5XTTML4kKUxr3nBIM0xeO5O5RxRODS4gyeSzfqEg9dOMMvVSPZz7k5RJPk7rXXJ35MAsmplEuimdKhpjwwoVXazQ4PQYcpEfBcyZzrWCr46AzylBCcGZMMJBYQZ6kUlClroiPUKMrCHCPMM5orZ9aWRXEeMwErlTwQ3Og4E6yJrmpo0tmqG0YKWUSFYgoophWKqUOxiVo1dKk3i0X6Q-LJ_68v0Cawmw4fRo-naCt0Vw6qIsMWagAa5gyEwFKdVwx_A7HLsFM
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+the+Pattern-based+CRF+for+Prediction+of+a+Protein+Local+Structure&rft.jtitle=Informatica+%28Ljubljana%29&rft.au=Mukanov%2C+Zhalgas&rft.au=Takhanov%2C+Rustem&rft.date=2022-05-01&rft.pub=Slovenian+Society+Informatika+%2F+Slovensko+drustvo+Informatika&rft.issn=0350-5596&rft.eissn=1854-3871&rft.volume=46&rft.issue=6&rft.spage=135&rft.epage=141&rft_id=info:doi/10.31449%2Finf.v46i6.3787&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0350-5596&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0350-5596&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0350-5596&client=summon