Ensemble classifier for protein fold pattern recognition

Motivation: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each tr...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 22; no. 14; pp. 1717 - 1722
Main Authors Shen, Hong-Bin, Chou, Kuo-Chen
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.07.2006
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivation: Prediction of protein folding patterns is one level deeper than that of protein structural classes, and hence is much more complicated and difficult. To deal with such a challenging problem, the ensemble classifier was introduced. It was formed by a set of basic classifiers, with each trained in different parameter systems, such as predicted secondary structure, hydrophobicity, van der Waals volume, polarity, polarizability, as well as different dimensions of pseudo-amino acid composition, which were extracted from a training dataset. The operation engine for the constituent individual classifiers was OET-KNN (optimized evidence-theoretic k-nearest neighbors) rule. Their outcomes were combined through a weighted voting to give a final determination for classifying a query protein. The recognition was to find the true fold among the 27 possible patterns. Results: The overall success rate thus obtained was 62% for a testing dataset where most of the proteins have <25% sequence identity with the proteins used in training the classifier. Such a rate is 6–21% higher than the corresponding rates obtained by various existing NN (neural networks) and SVM (support vector machines) approaches, implying that the ensemble classifier is very promising and might become a useful vehicle in protein science, as well as proteomics and bioinformatics. Availability: The ensemble classifier, called PFP-Pred, is available as a web-server at for public usage. Contact:lifesci-sjtu@san.rr.com Supplementary information: Supplementary data are available on Bioinformatics online.
Bibliography:To whom correspondence should be addressed.
Associate Editor: Keith A Crandall
ark:/67375/HXZ-4TJMTNVH-5
istex:E84CE0D07509B4A41E1194B52A9BAFEF48501CB4
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btl170