Evaluation of zero counts to better understand the discrepancies between bulk and single-cell RNA-Seq platforms

Recent advances in sample preparation and sequencing technology have made it possible to profile the transcriptomes of individual cells using single-cell RNA sequencing (scRNA-Seq). Compared to bulk RNA-Seq data, single-cell data often contain a higher percentage of zero reads, mainly due to lower s...

Full description

Saved in:

Bibliographic Details
Published in	Computational and structural biotechnology journal Vol. 21; pp. 4663 - 4674
Main Authors	Zyla, Joanna, Papiez, Anna, Zhao, Jun, Qu, Rihao, Li, Xiaotong, Kluger, Yuval, Polanska, Joanna, Hatzis, Christos, Pusztai, Lajos, Marczyk, Michal
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 01.01.2023 Research Network of Computational and Structural Biotechnology Elsevier
Subjects	Dropout rate Single-cell sequencing Technical factors Zeros Zeros Single-cell sequencing Technical factors Dropout rate
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent advances in sample preparation and sequencing technology have made it possible to profile the transcriptomes of individual cells using single-cell RNA sequencing (scRNA-Seq). Compared to bulk RNA-Seq data, single-cell data often contain a higher percentage of zero reads, mainly due to lower sequencing depth per cell, which affects mostly measurements of low-expression genes. However, discrepancies between platforms are observed regardless of expression level. Using four paired datasets with multiple samples each, we investigated technical and biological factors that can contribute to this expression shift. Using two separate machine learning models we found that, in addition to expression level, RNA integrity, gene or UTR3 length, and the number of transcripts potentially also influence the occurrence of zeros. These findings could enable the development of novel analytical methods for cross-platform expression shift correction. We also identified genes and biological pathways in our diverse datasets that consistently showed differences when assessed at the single cell versus bulk level to assist in interpreting analysis across transcriptomic platforms. At the gene level, 25 genes (0.12%) were found in all datasets as discordant, but at the pathway level, 7 pathways (2.02%) showed shared enrichment in discordant genes. [Display omitted]
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Present Address: Department of Data Science and Engineering, Silesian University of Technology, Akademicka 16, Gliwice, 44–100, Poland.
ISSN:	2001-0370 2001-0370
DOI:	10.1016/j.csbj.2023.09.035