Ensemble classifier for protein fold pattern recognition
Motivation: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each tr...
Saved in:
Published in | Bioinformatics Vol. 22; no. 14; pp. 1717 - 1722 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Oxford
Oxford University Press
15.07.2006
Oxford Publishing Limited (England) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Motivation: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. Results: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6–21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. Availability: The ensemble classifier, called PFP-Pred, is available as a web-server at for public usage. Contact:lifesci-sjtu@san.rr.com Supplementary information: Supplementary data are available on Bioinformatics online. |
---|---|
Bibliography: | To whom correspondence should be addressed. Associate Editor: Keith A Crandall ark:/67375/HXZ-4TJMTNVH-5 istex:E84CE0D07509B4A41E1194B52A9BAFEF48501CB4 ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
ISSN: | 1367-4803 1460-2059 1367-4811 |
DOI: | 10.1093/bioinformatics/btl170 |