Identifying suitable tools for variant detection and differential gene expression using RNA-seq data

Neurodegenerative diseases are the most predominate brain disorders around the globe and the affected populations are rapidly increasing. Recently, these diseases have been addressed using the data obtained from RNA-sequencing technology to reveal the changes in gene/transcript expression, effect of...

Full description

Saved in:
Bibliographic Details
Published inGenomics (San Diego, Calif.) Vol. 112; no. 3; pp. 2166 - 2172
Main Authors Dharshini, S. Akila Parvathy, Taguchi, Y.-H., Gromiha, M. Michael
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 01.05.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Neurodegenerative diseases are the most predominate brain disorders around the globe and the affected populations are rapidly increasing. Recently, these diseases have been addressed using the data obtained from RNA-sequencing technology to reveal the changes in gene/transcript expression, effect of variants, and pathways involved in disease mechanisms. However, the observations mainly depend on the aligners/tools and the performance of existing RNA-seq tools on hg38 genome assembly has not yet been documented. In this study, we performed a systematic analysis of various spliced aligners, transcript assembling and variant calling tools based on both genomic assemblies (hg19/hg38) from hippocampus brain tissue. This helps to identify the best possible combination tools for hg38 annotation. In order to evaluate the identified variants from various pipelines, we compared them with expression Quantitative Trait Loci (eQTL) and Genome-Wide Association Study (GWAS). In addition, the identified differentially expressed genes (DG) were compared with microarray studies. From our analysis of variant calling, the combination of GATK (Genome Analysis Tool-kit) and STAR (Spliced Transcripts Alignment to a Reference) protocol yields a larger number of GWAS/eQTL variants compared to SAMtools (Sequence Alignment Map). We also identified a higher number of non-coding variants in hg38 compared to hg19 due to enhanced annotation. In the case of various DG pipelines, we found that the Salmon-based hg38 transcriptomic quantification yields a higher number of reported DG compared to other genome-based quantification methods. This study revealed that higher number of reads maps to multiple location of the genome with hg38 compared to hg19, and these spurious multi-mapped reads may affect the gene quantification techniques. We suggest that it is necessary to develop efficient algorithms, which can handle the multi-mapped reads and improve the performance of genome-based alignment quantification. •We evaluated the RNAseq pipelines based on hg38 genomic assembly.•In spliced alignment, the rate of multi-mapped reads are high compared to hg19.•The aligners not able to distinguish the origin of the reads and tend to map with multiple location.•GATK/STAR variant calling protocol yields more number of GWAS variants from RNAseq data.•Transcriptome based quantification outperforms the genome based quantification methods.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
ISSN:0888-7543
1089-8646
DOI:10.1016/j.ygeno.2019.12.011