Subcellular localization for Gram positive and Gram negative bacterial proteins using linear interpolation smoothing model

Protein subcellular localization is an important topic in proteomics since it is related to a protein׳s overall function, helps in the understanding of metabolic pathways, and in drug design and discovery. In this paper, a basic approximation technique from natural language processing called the lin...

Full description

Saved in:

Bibliographic Details
Published in	Journal of theoretical biology Vol. 386; pp. 25 - 33
Main Authors	Saini, Harsh, Raicar, Gaurav, Dehzangi, Abdollah, Lal, Sunil, Sharma, Alok
Format	Journal Article
Language	English
Published	England Elsevier Ltd 07.12.2015
Subjects	Algorithms Bacteria Bacterial Proteins - metabolism Dependency models Feature extraction Gram-Negative Bacteria - metabolism Gram-Positive Bacteria - metabolism Hidden Markov models Markov Chains Models, Statistical Natural Language Processing Proteomics - methods Sensitivity and Specificity Subcellular Fractions - metabolism Feature extraction Dependency models Natural language processing Hidden Markov models
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Protein subcellular localization is an important topic in proteomics since it is related to a protein׳s overall function, helps in the understanding of metabolic pathways, and in drug design and discovery. In this paper, a basic approximation technique from natural language processing called the linear interpolation smoothing model is applied for predicting protein subcellular localizations. The proposed approach extracts features from syntactical information in protein sequences to build probabilistic profiles using dependency models, which are used in linear interpolation to determine how likely is a sequence to belong to a particular subcellular location. This technique builds a statistical model based on maximum likelihood. It is able to deal effectively with high dimensionality that hinders other traditional classifiers such as Support Vector Machines or k-Nearest Neighbours without sacrificing performance. This approach has been evaluated by predicting subcellular localizations of Gram positive and Gram negative bacterial proteins. •We introduce a novel classifier, linear interpolation, for subcellular localization.•Inspiration to use this technique came from natural language processing.•The techniques tries to model dependencies between amino acids.•We achieved good results on two bacterial datasets.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0022-5193 1095-8541
DOI:	10.1016/j.jtbi.2015.08.020