Comparison of Normalization Methods for Analysis of TempO-Seq Targeted RNA Sequencing Data

Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatic...

Full description

Saved in:
Bibliographic Details
Published inFrontiers in genetics Vol. 11; p. 594
Main Authors Bushel, Pierre R., Ferguson, Stephen S., Ramaiahgari, Sreenivasa C., Paules, Richard S., Auerbach, Scott S.
Format Journal Article
LanguageEnglish
Published Frontiers Media S.A 23.06.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Analysis of bulk RNA sequencing (RNA-Seq) data is a valuable tool to understand transcription at the genome scale. Targeted sequencing of RNA has emerged as a practical means of assessing the majority of the transcriptomic space with less reliance on large resources for consumables and bioinformatics. TempO-Seq is a templated, multiplexed RNA-Seq platform that interrogates a panel of sentinel genes representative of genome-wide transcription. Nuances of the technology require proper preprocessing of the data. Various methods have been proposed and compared for normalizing bulk RNA-Seq data, but there has been little to no investigation of how the methods perform on TempO-Seq data. We simulated count data into two groups (treated vs. untreated) at seven-fold change (FC) levels (including no change) using control samples from human HepaRG cells run on TempO-Seq and normalized the data using seven normalization methods. Upper Quartile (UQ) performed the best with regard to maintaining FC levels as detected by a limma contrast between treated vs. untreated groups. For all FC levels, specificity of the UQ normalization was greater than 0.84 and sensitivity greater than 0.90 except for the no change and +1.5 levels. Furthermore, K-means clustering of the simulated genes normalized by UQ agreed the most with the FC assignments [adjusted Rand index (ARI) = 0.67]. Despite having an assumption of the majority of genes being unchanged, the DESeq2 scaling factors normalization method performed reasonably well as did simple normalization procedures counts per million (CPM) and total counts (TCs). These results suggest that for two class comparisons of TempO-Seq data, UQ, CPM, TC, or DESeq2 normalization should provide reasonably reliable results at absolute FC levels ≥2.0. These findings will help guide researchers to normalize TempO-Seq gene expression data for more reliable results.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Reviewed by: Fei Li, Zhejiang University, China; Gonzalo Riadi, University of Talca, Chile
Edited by: Dapeng Wang, University of Leeds, United Kingdom
This article was submitted to Computational Genomics, a section of the journal Frontiers in Genetics
ISSN:1664-8021
1664-8021
DOI:10.3389/fgene.2020.00594