Regulatory network-based imputation of dropouts in single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values ('dropout imputation'). Most existing dropout imputation methods are limited in the sense that th...

Full description

Saved in:

Bibliographic Details
Published in	PLoS computational biology Vol. 18; no. 2; p. e1009849
Main Authors	Leote, Ana Carolina, Wu, Xiaohui, Beyer, Andreas
Format	Journal Article
Language	English
Published	United States Public Library of Science 01.02.2022 Public Library of Science (PLoS)
Subjects	Biology and Life Sciences Computer and Information Sciences Computer applications Datasets Gene expression Gene Expression Profiling Gene Regulatory Networks - genetics Gene sequencing Genes Genetic regulation Genetic research Humans HyperText Markup Language Kinases Methods Ribonucleic acid RNA School dropouts Sequence Analysis, RNA Similarity measures Single-Cell Analysis - methods Software Transcription Variation Whole Exome Sequencing Germany
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Single-cell RNA sequencing (scRNA-seq) methods are typically unable to quantify the expression levels of all genes in a cell, creating a need for the computational prediction of missing values ('dropout imputation'). Most existing dropout imputation methods are limited in the sense that they exclusively use the scRNA-seq dataset at hand and do not exploit external gene-gene relationship information. Further, it is unknown if all genes equally benefit from imputation or which imputation method works best for a given gene. Here, we show that a transcriptional regulatory network learned from external, independent gene expression data improves dropout imputation. Using a variety of human scRNA-seq datasets we demonstrate that our network-based approach outperforms published state-of-the-art methods. The network-based approach performs particularly well for lowly expressed genes, including cell-type-specific transcriptional regulators. Further, the cell-to-cell variation of 11.3% to 48.8% of the genes could not be adequately imputed by any of the methods that we tested. In those cases gene expression levels were best predicted by the mean expression across all cells, i.e. assuming no measurable expression variation between cells. These findings suggest that different imputation methods are optimal for different genes. We thus implemented an R-package called ADImpute (available via Bioconductor https://bioconductor.org/packages/release/bioc/html/ADImpute.html) that automatically determines the best imputation method for each gene in a dataset. Our work represents a paradigm shift by demonstrating that there is no single best imputation method. Instead, we propose that imputation should maximally exploit external information and be adapted to gene-specific features, such as expression level and expression variation across cells.
Bibliography:	new_version ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 The authors have declared that no competing interests exist.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1009849