DPI_CDF: druggable protein identifier using cascade deep forest

Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedi...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 25; no. 1; pp. 145 - 18
Main Authors	Arif, Muhammad, Fang, Ge, Ghulam, Ali, Musleh, Saleh, Alam, Tanvir
Format	Journal Article
Language	English
Published	England BioMed Central Ltd 05.04.2024 BioMed Central BMC
Subjects	Accuracy Algorithms Amino Acid Sequence Analysis Availability Bioinformatics Biological Evolution Cascade deep forest Computational Biology - methods Datasets Deep learning Drug discovery Druggable proteins Drugs Machine learning Methods Neural networks NMR Nuclear magnetic resonance Peptides Physicochemical features Physiochemistry Position-Specific Scoring Matrices Protein sequencing Proteins PSSM Qualitative analysis Sequencing Software Support vector machines Technology application Therapeutic targets Training Qatar PSSM Cascade deep forest Bioinformatics Druggable proteins Physicochemical features
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Drug targets in living beings perform pivotal roles in the discovery of potential drugs. Conventional wet-lab characterization of drug targets is although accurate but generally expensive, slow, and resource intensive. Therefore, computational methods are highly desirable as an alternative to expedite the large-scale identification of druggable proteins (DPs); however, the existing in silico predictor's performance is still not satisfactory. In this study, we developed a novel deep learning-based model DPI_CDF for predicting DPs based on protein sequence only. DPI_CDF utilizes evolutionary-based (i.e., histograms of oriented gradients for position-specific scoring matrix), physiochemical-based (i.e., component protein sequence representation), and compositional-based (i.e., normalized qualitative characteristic) properties of protein sequence to generate features. Then a hierarchical deep forest model fuses these three encoding schemes to build the proposed model DPI_CDF. The empirical outcomes on 10-fold cross-validation demonstrate that the proposed model achieved 99.13 % accuracy and 0.982 of Matthew's-correlation-coefficient (MCC) on the training dataset. The generalization power of the trained model is further examined on an independent dataset and achieved 95.01% of maximum accuracy and 0.900 MCC. When compared to current state-of-the-art methods, DPI_CDF improves in terms of accuracy by 4.27% and 4.31% on training and testing datasets, respectively. We believe, DPI_CDF will support the research community to identify druggable proteins and escalate the drug discovery process. The benchmark datasets and source codes are available in GitHub: http://github.com/Muhammad-Arif-NUST/DPI_CDF .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-024-05744-3