EL-RMLocNet: An explainable LSTM network for RNA-associated multi-compartment localization prediction
[Display omitted] •A graph based novel approach for transforming RNA sequences into statistical vectors.•Development of a robust and precise deep learning classifier.•Utilization of attention mechanism capable of generating comprehensive feature space.•Incorporation of explainability suitable for hi...
Saved in:
Published in | Computational and structural biotechnology journal Vol. 20; pp. 3986 - 4002 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Netherlands
Elsevier B.V
01.01.2022
Research Network of Computational and Structural Biotechnology Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | [Display omitted]
•A graph based novel approach for transforming RNA sequences into statistical vectors.•Development of a robust and precise deep learning classifier.•Utilization of attention mechanism capable of generating comprehensive feature space.•Incorporation of explainability suitable for highlighting distributions of nucleotides.•Development of a public web server use-able for predicting subcellular localizations.
Subcellular localization of Ribonucleic Acid (RNA) molecules provide significant insights into the functionality of RNAs and helps to explore their association with various diseases. Predominantly developed single-compartment localization predictors (SCLPs) lack to demystify RNA association with diverse biochemical and pathological processes mainly happen through RNA co-localization in multiple compartments. Limited multi-compartment localization predictors (MCLPs) manage to produce decent performance only for target RNA class of particular sub-type. Further, existing computational approaches have limited practical significance and potential to optimize therapeutics due to the poor degree of model explainability. The paper in hand presents an explainable Long Short-Term Memory (LSTM) network “EL-RMLocNet”, predictive performance and interpretability of which are optimized using a novel GeneticSeq2Vec statistical representation learning scheme and attention mechanism for accurate multi-compartment localization prediction of different RNAs solely using raw RNA sequences. GeneticSeq2Vec generates optimized statistical vectors of raw RNA sequences by capturing short and long range relations of nucleotide k-mers. Using sequence vectors generated by GeneticSeq2Vec scheme, Long Short Term Memory layers extract most informative features, weighting of which on the basis of discriminative potential for accurate multi-compartment localization prediction is performed using attention layer. Through reverse engineering, weights of statistical feature space are mapped to nucleotide k-mers patterns to make multi-compartment localization prediction decision making transparent and explainable for different RNA classes and species. Empirical evaluation indicates that EL-RMLocNet outperforms state-of-the-art predictor for subcellular localization prediction of 4 different RNA classes by an average accuracy figure of 8% for Homo Sapiens species and 6% for Mus Musculus species. EL-RMLocNet is freely available as a web server at (https://sds_genetic_analysis.opendfki.de/subcellular_loc/). |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2001-0370 2001-0370 |
DOI: | 10.1016/j.csbj.2022.07.031 |