Identifying suitable tools for variant detection and differential gene expression using RNA-seq data
Neurodegenerative diseases are the most predominate brain disorders around the globe and the affected populations are rapidly increasing. Recently, these diseases have been addressed using the data obtained from RNA-sequencing technology to reveal the changes in gene/transcript expression, effect of...
Saved in:
Published in | Genomics (San Diego, Calif.) Vol. 112; no. 3; pp. 2166 - 2172 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Inc
01.05.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Neurodegenerative diseases are the most predominate brain disorders around the globe and the affected populations are rapidly increasing. Recently, these diseases have been addressed using the data obtained from RNA-sequencing technology to reveal the changes in gene/transcript expression, effect of variants, and pathways involved in disease mechanisms. However, the observations mainly depend on the aligners/tools and the performance of existing RNA-seq tools on hg38 genome assembly has not yet been documented. In this study, we performed a systematic analysis of various spliced aligners, transcript assembling and variant calling tools based on both genomic assemblies (hg19/hg38) from hippocampus brain tissue. This helps to identify the best possible combination tools for hg38 annotation. In order to evaluate the identified variants from various pipelines, we compared them with expression Quantitative Trait Loci (eQTL) and Genome-Wide Association Study (GWAS). In addition, the identified differentially expressed genes (DG) were compared with microarray studies. From our analysis of variant calling, the combination of GATK (Genome Analysis Tool-kit) and STAR (Spliced Transcripts Alignment to a Reference) protocol yields a larger number of GWAS/eQTL variants compared to SAMtools (Sequence Alignment Map). We also identified a higher number of non-coding variants in hg38 compared to hg19 due to enhanced annotation. In the case of various DG pipelines, we found that the Salmon-based hg38 transcriptomic quantification yields a higher number of reported DG compared to other genome-based quantification methods. This study revealed that higher number of reads maps to multiple location of the genome with hg38 compared to hg19, and these spurious multi-mapped reads may affect the gene quantification techniques. We suggest that it is necessary to develop efficient algorithms, which can handle the multi-mapped reads and improve the performance of genome-based alignment quantification.
•We evaluated the RNAseq pipelines based on hg38 genomic assembly.•In spliced alignment, the rate of multi-mapped reads are high compared to hg19.•The aligners not able to distinguish the origin of the reads and tend to map with multiple location.•GATK/STAR variant calling protocol yields more number of GWAS variants from RNAseq data.•Transcriptome based quantification outperforms the genome based quantification methods. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 |
ISSN: | 0888-7543 1089-8646 |
DOI: | 10.1016/j.ygeno.2019.12.011 |