基于序列特征筛选与支持向量回归预测蛋白质折叠速率

折叠速率预测对阐明蛋白质折叠机理意义重大.本文收集了115条目前已知折叠速率的蛋白质样本(包括二态、多态和混态蛋白),为了较全面地表征蛋白质分子的一级结构信息,提取序列长度、氨基酸残基多尺度组分、成对残基k-space特征与基于残基物理化学性质的地统计学关联总共9357维特征.经改进的二元矩阵重排过滤器和多轮末尾淘汰非线性筛选,获得23个物理化学意义明确的保留特征,建立的非线性支持向量回归模型Jackknife交叉验证的相关系数R=0.95,优于文献报道及其他参比特征选择方法.支持向量回归解释体系表明折叠速率与保留描述符的非线性回归极显著,分析了各保留描述符对折叠速率的影响,结果表明蛋白质折叠...

Full description

Saved in:
Bibliographic Details
Published in物理化学学报 Vol. 30; no. 6; pp. 1091 - 1098
Main Author 李咏 周玮 代志军 陈渊 王志明 袁哲明
Format Journal Article
LanguageChinese
Published 湖南农业大学,湖南省作物种质创新与资源利用重点实验室,长沙410128 2014
湖南农业大学,湖南省植物病虫害生物学及防控重点实验室,长沙410128
Subjects
Online AccessGet full text
ISSN1000-6818
DOI10.3866/PKU.WHXB201404091

Cover

Abstract 折叠速率预测对阐明蛋白质折叠机理意义重大.本文收集了115条目前已知折叠速率的蛋白质样本(包括二态、多态和混态蛋白),为了较全面地表征蛋白质分子的一级结构信息,提取序列长度、氨基酸残基多尺度组分、成对残基k-space特征与基于残基物理化学性质的地统计学关联总共9357维特征.经改进的二元矩阵重排过滤器和多轮末尾淘汰非线性筛选,获得23个物理化学意义明确的保留特征,建立的非线性支持向量回归模型Jackknife交叉验证的相关系数R=0.95,优于文献报道及其他参比特征选择方法.支持向量回归解释体系表明折叠速率与保留描述符的非线性回归极显著,分析了各保留描述符对折叠速率的影响,结果表明蛋白质折叠速率与序列长度、中短程关联特征、三联体残基组份特征等密切相关.
AbstractList O641; 折叠速率预测对阐明蛋白质折叠机理意义重大.本文收集了115条目前已知折叠速率的蛋白质样本(包括二态、多态和混态蛋白),为了较全面地表征蛋白质分子的一级结构信息,提取序列长度、氨基酸残基多尺度组分、成对残基k-space特征与基于残基物理化学性质的地统计学关联总共9357维特征.经改进的二元矩阵重排过滤器和多轮末尾淘汰非线性筛选,获得23个物理化学意义明确的保留特征,建立的非线性支持向量回归模型Jackknife交叉验证的相关系数R=0.95,优于文献报道及其他参比特征选择方法.支持向量回归解释体系表明折叠速率与保留描述符的非线性回归极显著,分析了各保留描述符对折叠速率的影响,结果表明蛋白质折叠速率与序列长度、中短程关联特征、三联体残基组份特征等密切相关.
折叠速率预测对阐明蛋白质折叠机理意义重大.本文收集了115条目前已知折叠速率的蛋白质样本(包括二态、多态和混态蛋白),为了较全面地表征蛋白质分子的一级结构信息,提取序列长度、氨基酸残基多尺度组分、成对残基k-space特征与基于残基物理化学性质的地统计学关联总共9357维特征.经改进的二元矩阵重排过滤器和多轮末尾淘汰非线性筛选,获得23个物理化学意义明确的保留特征,建立的非线性支持向量回归模型Jackknife交叉验证的相关系数R=0.95,优于文献报道及其他参比特征选择方法.支持向量回归解释体系表明折叠速率与保留描述符的非线性回归极显著,分析了各保留描述符对折叠速率的影响,结果表明蛋白质折叠速率与序列长度、中短程关联特征、三联体残基组份特征等密切相关.
Abstract_FL Folding rate prediction plays an important role in clarifying the protein folding mechanism. In this work, we col ected 115 protein samples with known folding rates including two-, multi-, and mixed-state proteins. To characterize the primary structure information of the protein molecules more comprehensively, we considered sequence length, residue components with different scales, k-space features for pair residues, and geostatistics association features among different locations of the residues substituted with corresponding physical-chemical properties. Each protein sequence was represented by a numeric vector containing 9357 numbers. We selected 23 features with a clear meaning from the above-mentioned high-dimensional features for each sample, after conducting an improved binary matrix shuffling filter and a worst descriptor elimination multi-round method. We constructed a nonlinear support vector regression (SVR) model based on the folding rate and the 23 retained features. The correlation coefficient of the Jackknife cross validation was 0.95. Our prediction accuracy was superior to other results from the literature and other reference feature selection methods. Final y, we established an interpretability system for SVR, and our data showed that the nonlinear regression relationship between the folding rates and the reserved features was highly significant. By further analyzing the effects of each retained descriptor on protein folding rates, the results showed that the protein folding rate might be closely related to the sequence length, the features associated with the medium-and short-range, the triplet residues component features, etc.
Author 李咏 周玮 代志军 陈渊 王志明 袁哲明
AuthorAffiliation 湖南农业大学,湖南省作物种质创新与资源利用重点实验室,长沙410128
AuthorAffiliation_xml – name: 湖南农业大学,湖南省作物种质创新与资源利用重点实验室,长沙410128; 湖南农业大学,湖南省植物病虫害生物学及防控重点实验室,长沙410128
Author_FL YUAN Zhe-Ming
ZHOU Wei
DAI Zhi-Jun
CHEN Yuan
LI Yong
WANG Zhi-Ming
Author_FL_xml – sequence: 1
  fullname: LI Yong
– sequence: 2
  fullname: ZHOU Wei
– sequence: 3
  fullname: DAI Zhi-Jun
– sequence: 4
  fullname: CHEN Yuan
– sequence: 5
  fullname: WANG Zhi-Ming
– sequence: 6
  fullname: YUAN Zhe-Ming
Author_xml – sequence: 1
  fullname: 李咏 周玮 代志军 陈渊 王志明 袁哲明
BookMark eNotz0FLAkEABeA5GGTmD-jYqdPazO7M7MyxpDIS6mDUTXa2GTVsLSW0m0JgZmhFUCFkdOoSXQpKsF_j7Lr_IslO7_LxHm8ORLySJwFYQDBhMUqXd7Z2E3up_VUTIgwx5CgCoghCaFCG2CyIVyoFASFCkJiURUFK9wejQUcPuvriPmh9659G8NYL663RV8e_e_evGvr6Jmx2de9JD2_Dl3P_sz3utYPH4fjj1b980N3nsN4POs15MKOcYkXG_zMGMutrmWTKSG9vbCZX0oZLGDcodoRLlITElRg5By5GnNsWlczCgjLMMBFSCaKoK4RkCpqKuBwSW0ipIBJWDCxNa6uOpxwvlz0snZa9yWC2WszXauLvNoWQT-TiVLr5kpc7KUzscblw5JTPspjbnNjItH4By3R0kQ
ClassificationCodes O641
ContentType Journal Article
Copyright Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
Copyright_xml – notice: Copyright © Wanfang Data Co. Ltd. All Rights Reserved.
DBID 2RA
92L
CQIGP
~WA
2B.
4A8
92I
93N
PSX
TCJ
DOI 10.3866/PKU.WHXB201404091
DatabaseName 维普_期刊
中文科技期刊数据库-CALIS站点
中文科技期刊数据库-7.0平台
中文科技期刊数据库- 镜像站点
Wanfang Data Journals - Hong Kong
WANFANG Data Centre
Wanfang Data Journals
万方数据期刊 - 香港版
China Online Journals (COJ)
China Online Journals (COJ)
DatabaseTitleList

DeliveryMethod fulltext_linktorsrc
Discipline Chemistry
DocumentTitleAlternate Predicting the Protein Folding Rate Based on Sequence Feature Screening and Support Vector Regression
DocumentTitle_FL Predicting the Protein Folding Rate Based on Sequence Feature Screening and Support Vector Regression
EndPage 1098
ExternalDocumentID wlhxxb201406009
49795712
GrantInformation_xml – fundername: The project was supported by the Specialized Research Fund for the Doctoral Program of Higher Education, China; National Natural Science Foundation of China; Natural Science Foundation of Hunan Province, China (14JJ3092).教育部博士点基金; 国家自然科学基金; 湖南省自然科学基金
  funderid: (20124320110002); (31301388); (20124320110002); (31301388); (14JJ3092)
GroupedDBID -02
2B.
2C.
2RA
5XA
5XC
92E
92I
92L
ACGFS
AENEX
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CDRFL
CQIGP
CW9
EBS
EJD
FIJ
OK1
P2P
RIG
TCJ
TGP
U1G
U5L
~WA
4A8
93N
AAXUO
AAYWO
ADMLS
FDB
M41
PSX
ROL
UY8
ID FETCH-LOGICAL-c589-64abc5fe05ce41adc4199736e834b684845befb5f6cbbe8f02f5c9057beef01b3
ISSN 1000-6818
IngestDate Thu May 29 03:54:35 EDT 2025
Wed Feb 14 10:35:13 EST 2024
IsPeerReviewed true
IsScholarly true
Issue 6
Keywords 支持向量回归
Folding rate prediction
High-dimensional feature
Feature screening
Protein folding
特征筛选
Support vector regression
蛋白质折叠
高维特征
折叠速率预测
Language Chinese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c589-64abc5fe05ce41adc4199736e834b684845befb5f6cbbe8f02f5c9057beef01b3
Notes 11-1892/06
Protein folding;Folding rate prediction;High-dimensional feature;Feature screening;Support vector regression
LI Yong, ZHOU Wei, DAI Zhi-Jun, CHEN Yuan, WANG Zhi-Ming, YUAN Zhe-Ming (Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Hunan Agricultural University, Changsha 41 O128, P. R. China; Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, P. R. China)
Folding rate prediction plays an important role in clarifying the protein folding mechanism. In this work, we col ected 115 protein samples with known folding rates including two-, multi-, and mixed-state proteins. To characterize the primary structure information of the protein molecules more comprehensively, we considered sequence length, residue components with different scales, k-space features for pair residues, and geostatistics association features among different locations of the residues substituted with corresponding physical
PageCount 8
ParticipantIDs wanfang_journals_wlhxxb201406009
chongqing_primary_49795712
PublicationCentury 2000
PublicationDate 2014
PublicationDateYYYYMMDD 2014-01-01
PublicationDate_xml – year: 2014
  text: 2014
PublicationDecade 2010
PublicationTitle 物理化学学报
PublicationTitleAlternate Acta Physico-Chimica Sinica
PublicationTitle_FL Acta Physico-Chimica Sinica
PublicationYear 2014
Publisher 湖南农业大学,湖南省作物种质创新与资源利用重点实验室,长沙410128
湖南农业大学,湖南省植物病虫害生物学及防控重点实验室,长沙410128
Publisher_xml – name: 湖南农业大学,湖南省作物种质创新与资源利用重点实验室,长沙410128
– name: 湖南农业大学,湖南省植物病虫害生物学及防控重点实验室,长沙410128
SSID ssib001105268
ssj0030168
ssib024507715
ssib002258135
ssib051374152
ssib057925156
Score 2.01017
Snippet ...
O641;...
SourceID wanfang
chongqing
SourceType Aggregation Database
Publisher
StartPage 1091
SubjectTerms 折叠速率预测
支持向量回归
特征筛选
蛋白质折叠
高维特征
Title 基于序列特征筛选与支持向量回归预测蛋白质折叠速率
URI http://lib.cqvip.com/qk/92644X/201406/49795712.html
https://d.wanfangdata.com.cn/periodical/wlhxxb201406009
Volume 30
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3Na9RAFA-1HvQifmKrlR6c05Kaj0lm5phsE9YK4qHF3padbNIWZOtHi6WnFoRaK62KoFKw4smLeFHQQv1rmt3uf-F7k9ltpEXUw4a3M_Pe_N57uzNvkpkXw7huS-HwJmUmazRTk0qracLHNTPXSqikPo6buNvitl-bohPT3vTAiZHSrqXFBTmWLB97ruR_vApl4Fc8JfsPnu0LhQKgwb9wBQ_D9a98TCKPiJiEAYkoXnmEJUjESHBOBCMRI1yQUKiqCCJHLAnGiQhJJHCjA9QiO1fsPhGUBDESvKoaQxcWEbZqzLRk4BVFXyDHwarAIZwiVwj9gmSObZBgRAhsBiUhSOZKckAEVwhjElgaBiiCUAEhK0fMGn8glChA6yvGKhGKAEWCMqGEB17vd6QUGteGAaQ8rijKVkhUd0FUUfqHJHCVSrGymoc9ibCC6EADroCjkYJKD2dYau6jRhxFcWWLwnAuCZ1-XfkGS3GkVc8GeOze53qC0NOFfow0d2TsxxSrpTgCvvLj5iiX-3i75M6tqbG7tenQUQmO-qy_pf6mggmP4Tu0TzqM2bhjNb45cRjt2ipZTyka87h9GP06FGJ9dhjtebaL4WM_uxoM7fqkqNayeOiP-G4cQYepR2bnWzMPIKBS59taWaM1UwrFJs8aZ_QaajQo_hDnjIHl2fPGqWrv1YUXjFq-s7u_u5nvbuVP33TWf-Q_Vzuft7sr6_vfN9uvv7Sfr-YvXnbXtvLt9_neq-7HJ-1vGwfbG513ewdfP7Wfvc23PnRXdjqbaxeNyTiarNZM_coQM_G4MH3akLh90vKSlNqNZkJxH5Xrp9yl0ueUU0-mmfQyP5Ey5ZnlZF4iYMki0zSzbOleMgZb8630sjHqshSf-MPqw27QpGFLlmS-Izn0YCWZZEPGcN8g9ftFZph6z2FDxqi2UF0PF4_qj-_NLi1JZVBYZIjhP_FfMU5jw-JW31VjcOHhYjoCwe-CvKZ-Ar8AP1aIjA
linkProvider Ingenta
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=%E5%9F%BA%E4%BA%8E%E5%BA%8F%E5%88%97%E7%89%B9%E5%BE%81%E7%AD%9B%E9%80%89%E4%B8%8E%E6%94%AF%E6%8C%81%E5%90%91%E9%87%8F%E5%9B%9E%E5%BD%92%E9%A2%84%E6%B5%8B%E8%9B%8B%E7%99%BD%E8%B4%A8%E6%8A%98%E5%8F%A0%E9%80%9F%E7%8E%87&rft.jtitle=%E7%89%A9%E7%90%86%E5%8C%96%E5%AD%A6%E5%AD%A6%E6%8A%A5&rft.au=%E6%9D%8E%E5%92%8F+%E5%91%A8%E7%8E%AE+%E4%BB%A3%E5%BF%97%E5%86%9B+%E9%99%88%E6%B8%8A+%E7%8E%8B%E5%BF%97%E6%98%8E+%E8%A2%81%E5%93%B2%E6%98%8E&rft.date=2014&rft.issn=1000-6818&rft.volume=30&rft.issue=6&rft.spage=1091&rft.epage=1098&rft_id=info:doi/10.3866%2FPKU.WHXB201404091&rft.externalDocID=49795712
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fimage.cqvip.com%2Fvip1000%2Fqk%2F92644X%2F92644X.jpg
http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=http%3A%2F%2Fwww.wanfangdata.com.cn%2Fimages%2FPeriodicalImages%2Fwlhxxb%2Fwlhxxb.jpg