DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors

Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as “doublets,” which limit cell throughput and lead to spurious biological conclusions. Here, we present a computational doublet detection tool—DoubletFinder—that identifies doublets using only gene expres...

Full description

Saved in:
Bibliographic Details
Published inCell systems Vol. 8; no. 4; pp. 329 - 337.e4
Main Authors McGinnis, Christopher S., Murrow, Lyndsay M., Gartner, Zev J.
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 24.04.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Single-cell RNA sequencing (scRNA-seq) data are commonly affected by technical artifacts known as “doublets,” which limit cell throughput and lead to spurious biological conclusions. Here, we present a computational doublet detection tool—DoubletFinder—that identifies doublets using only gene expression data. DoubletFinder predicts doublets according to each real cell’s proximity in gene expression space to artificial doublets created by averaging the transcriptional profile of randomly chosen cell pairs. We first use scRNA-seq datasets where the identity of doublets is known to show that DoubletFinder identifies doublets formed from transcriptionally distinct cells. When these doublets are removed, the identification of differentially expressed genes is enhanced. Second, we provide a method for estimating DoubletFinder input parameters, allowing its application across scRNA-seq datasets with diverse distributions of cell types. Lastly, we present “best practices” for DoubletFinder applications and illustrate that DoubletFinder is insensitive to an experimentally validated kidney cell type with “hybrid” expression features. [Display omitted] •DoubletFinder uses gene expression features to predict doublets in scRNA-seq data•DoubletFinder identifies doublets derived from transcriptionally distinct cells•Doublet removal improves differential gene expression analysis performance•DoubletFinder is insensitive to bona fide cells with “hybrid” expression profiles scRNA-seq data interpretation is confounded by technical artifacts known as doublets—single-cell transcriptome data representing more than one cell. Moreover, scRNA-seq cellular throughput is purposefully limited to minimize doublet formation rates. By identifying cells sharing expression features with simulated doublets, DoubletFinder detects many real doublets and mitigates these two limitations.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
AUTHOR CONTRIBUTIONS
C.S.M., L.M.M., and Z.J.G. conceptualized the method and wrote the manuscript. C.S.M. wrote the software. C.S.M. and L.M.M. performed bioinformatics analyses.
ISSN:2405-4712
2405-4720
2405-4720
DOI:10.1016/j.cels.2019.03.003