Benchmarking methods for detecting differential states between conditions from multi-subject single-cell RNA-seq data

Abstract Single-cell RNA-sequencing (scRNA-seq) enables researchers to quantify transcriptomes of thousands of cells simultaneously and study transcriptomic changes between cells. scRNA-seq datasets increasingly include multisubject, multicondition experiments to investigate cell-type-specific diffe...

Full description

Saved in:

Bibliographic Details
Published in	Briefings in bioinformatics Vol. 23; no. 5
Main Authors	Junttila, Sini, Smolander, Johannes, Elo, Laura L
Format	Journal Article
Language	English
Published	England Oxford University Press 20.09.2022 Oxford Publishing Limited (England)
Subjects	Benchmarking Gene Expression Profiling - methods Gene sequencing Humans Problem Solving Protocol RNA RNA-Seq Sequence Analysis, RNA - methods Transcriptomes Transcriptomics RNA sequencing (RNA-seq) differential expression single cell
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Abstract Single-cell RNA-sequencing (scRNA-seq) enables researchers to quantify transcriptomes of thousands of cells simultaneously and study transcriptomic changes between cells. scRNA-seq datasets increasingly include multisubject, multicondition experiments to investigate cell-type-specific differential states (DS) between conditions. This can be performed by first identifying the cell types in all the subjects and then by performing a DS analysis between the conditions within each cell type. Naïve single-cell DS analysis methods that treat cells statistically independent are subject to false positives in the presence of variation between biological replicates, an issue known as the pseudoreplicate bias. While several methods have already been introduced to carry out the statistical testing in multisubject scRNA-seq analysis, comparisons that include all these methods are currently lacking. Here, we performed a comprehensive comparison of 18 methods for the identification of DS changes between conditions from multisubject scRNA-seq data. Our results suggest that the pseudobulk methods performed generally best. Both pseudobulks and mixed models that model the subjects as a random effect were superior compared with the naïve single-cell methods that do not model the subjects in any way. While the naïve models achieved higher sensitivity than the pseudobulk methods and the mixed models, they were subject to a high number of false positives. In addition, accounting for subjects through latent variable modeling did not improve the performance of the naïve methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Sini Junttila and Johannes Smolander have contributed equally to this work.
ISSN:	1467-5463 1477-4054 1477-4054
DOI:	10.1093/bib/bbac286