EL-RMLocNet: An explainable LSTM network for RNA-associated multi-compartment localization prediction

[Display omitted] •A graph based novel approach for transforming RNA sequences into statistical vectors.•Development of a robust and precise deep learning classifier.•Utilization of attention mechanism capable of generating comprehensive feature space.•Incorporation of explainability suitable for hi...

Full description

Saved in:
Bibliographic Details
Published inComputational and structural biotechnology journal Vol. 20; pp. 3986 - 4002
Main Authors Asim, Muhammad Nabeel, Ibrahim, Muhammad Ali, Malik, Muhammad Imran, Zehe, Christoph, Cloarec, Olivier, Trygg, Johan, Dengel, Andreas, Ahmed, Sheraz
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.01.2022
Research Network of Computational and Structural Biotechnology
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •A graph based novel approach for transforming RNA sequences into statistical vectors.•Development of a robust and precise deep learning classifier.•Utilization of attention mechanism capable of generating comprehensive feature space.•Incorporation of explainability suitable for highlighting distributions of nucleotides.•Development of a public web server use-able for predicting subcellular localizations. Subcellular localization of Ribonucleic Acid (RNA) molecules provide significant insights into the functionality of RNAs and helps to explore their association with various diseases. Predominantly developed single-compartment localization predictors (SCLPs) lack to demystify RNA association with diverse biochemical and pathological processes mainly happen through RNA co-localization in multiple compartments. Limited multi-compartment localization predictors (MCLPs) manage to produce decent performance only for target RNA class of particular sub-type. Further, existing computational approaches have limited practical significance and potential to optimize therapeutics due to the poor degree of model explainability. The paper in hand presents an explainable Long Short-Term Memory (LSTM) network “EL-RMLocNet”, predictive performance and interpretability of which are optimized using a novel GeneticSeq2Vec statistical representation learning scheme and attention mechanism for accurate multi-compartment localization prediction of different RNAs solely using raw RNA sequences. GeneticSeq2Vec generates optimized statistical vectors of raw RNA sequences by capturing short and long range relations of nucleotide k-mers. Using sequence vectors generated by GeneticSeq2Vec scheme, Long Short Term Memory layers extract most informative features, weighting of which on the basis of discriminative potential for accurate multi-compartment localization prediction is performed using attention layer. Through reverse engineering, weights of statistical feature space are mapped to nucleotide k-mers patterns to make multi-compartment localization prediction decision making transparent and explainable for different RNA classes and species. Empirical evaluation indicates that EL-RMLocNet outperforms state-of-the-art predictor for subcellular localization prediction of 4 different RNA classes by an average accuracy figure of 8% for Homo Sapiens species and 6% for Mus Musculus species. EL-RMLocNet is freely available as a web server at (https://sds_genetic_analysis.opendfki.de/subcellular_loc/).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:2001-0370
2001-0370
DOI:10.1016/j.csbj.2022.07.031