SOLpro: accurate sequence-based prediction of protein solubility

Motivation: Protein insolubility is a major obstacle for many experimental studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to i...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 25; no. 17; pp. 2200 - 2207
Main Authors Magnan, Christophe N., Randall, Arlo, Baldi, Pierre
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 01.09.2009
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivation: Protein insolubility is a major obstacle for many experimental studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be soluble on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the solubility of insoluble proteins. Results: Here, we first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, we extract and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to standard evaluation metrics, with an overall accuracy of over 74% estimated using multiple runs of 10-fold cross-validation. Availability: SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at: http://scratch.proteomics.ics.uci.edu. Contact: pfbaldi@ics.uci.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Bibliography:To whom correspondence should be addressed.
Associate Editor: Burkhard Rost
istex:3386A2A21ED0B0EC10BD567F6A6E2CAF9D683121
ark:/67375/HXZ-R5FJPCG1-H
ArticleID:btp386
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ObjectType-Article-2
ObjectType-Feature-1
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/btp386