Probabilistic expression of spatially varied amino acid dimers into general form of Chou׳s pseudo amino acid composition for protein fold recognition

Identification of the tertiary structure (3D structure) of a protein is a fundamental problem in biology which helps in identifying its functions. Predicting a protein׳s fold is considered to be an intermediate step for identifying the tertiary structure of a protein. Computational methods have been...

Full description

Saved in:
Bibliographic Details
Published inJournal of theoretical biology Vol. 380; pp. 291 - 298
Main Authors Saini, Harsh, Raicar, Gaurav, Sharma, Alok, Lal, Sunil, Dehzangi, Abdollah, Lyons, James, Paliwal, Kuldip K., Imoto, Seiya, Miyano, Satoru
Format Journal Article
LanguageEnglish
Published England Elsevier Ltd 07.09.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Identification of the tertiary structure (3D structure) of a protein is a fundamental problem in biology which helps in identifying its functions. Predicting a protein׳s fold is considered to be an intermediate step for identifying the tertiary structure of a protein. Computational methods have been applied to determine a protein׳s fold by assembling information from its structural, physicochemical and/or evolutionary properties. In this study, we propose a scheme in which a feature extraction technique that extracts probabilistic expressions of amino acid dimers, which have varying degree of spatial separation in the primary sequences of proteins, from the Position Specific Scoring Matrix (PSSM). SVM classifier is used to create a model from extracted features for fold recognition. The performance of the proposed scheme is evaluated against three benchmarked datasets, namely the Ding and Dubchak, Extended Ding and Dubchak, and Taguchi and Gromiha datasets. The proposed scheme performed well in the experiments conducted, providing improvements over previously published results in literature. •Relationships between amino acid dimers that may be non-adjacent in sequence are explored.•Features are extracted directly from PSSM instead of raw counts from primary sequence.•SVM is used for classification.•Achieved good results on Ding and Dubchak, Extended Ding and Dubchak, and Taguchi and Gromhia datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0022-5193
1095-8541
DOI:10.1016/j.jtbi.2015.05.030