Graph embedding and Gaussian mixture variational autoencoder network for end-to-end analysis of single-cell RNA sequencing data

Single-cell RNA sequencing (scRNA-seq) is a revolutionary technology to determine the precise gene expression of individual cells and identify cell heterogeneity and subpopulations. However, technical limitations of scRNA-seq lead to heterogeneous and sparse data. Here, we present autoCell, a deep-l...

Full description

Saved in:
Bibliographic Details
Published inCell reports methods Vol. 3; no. 1; p. 100382
Main Authors Xu, Junlin, Xu, Jielin, Meng, Yajie, Lu, Changcheng, Cai, Lijun, Zeng, Xiangxiang, Nussinov, Ruth, Cheng, Feixiong
Format Journal Article
LanguageEnglish
Published United States Elsevier Inc 23.01.2023
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Single-cell RNA sequencing (scRNA-seq) is a revolutionary technology to determine the precise gene expression of individual cells and identify cell heterogeneity and subpopulations. However, technical limitations of scRNA-seq lead to heterogeneous and sparse data. Here, we present autoCell, a deep-learning approach for scRNA-seq dropout imputation and feature extraction. autoCell is a variational autoencoding network that combines graph embedding and a probabilistic depth Gaussian mixture model to infer the distribution of high-dimensional, sparse scRNA-seq data. We validate autoCell on simulated datasets and biologically relevant scRNA-seq. We show that interpolation of autoCell improves the performance of existing tools in identifying cell developmental trajectories of human preimplantation embryos. We identify disease-associated astrocytes (DAAs) and reconstruct DAA-specific molecular networks and ligand-receptor interactions involved in cell-cell communications using Alzheimer’s disease as a prototypical example. autoCell provides a toolbox for end-to-end analysis of scRNA-seq data, including visualization, clustering, imputation, and disease-specific gene network identification. [Display omitted] •autoCell imputes heterogeneous and sparse sc/snRNA-seq data•autoCell improves the performance of capturing cell developmental trajectories•autoCell captures disease-relevant cellular pathobiology in latent space•autoCell identifies cell-type-specific gene networks in Alzheimer’s disease Single-cell RNA sequencing (scRNA-seq) enables researchers to study gene expression at cellular resolution. However, noise caused by amplification and dropout may hamper precise data analyses. It is urgent to develop scalable denoising methods to deal with the increasingly large, but sparse, scRNA-seq data. Here, we present autoCell, a graph-embedded Gaussian mixture variational autoencoder network algorithm for scRNA-seq dropout imputation and feature extraction. Our autoCell provides a deep-learning toolbox for end-to-end analysis of large-scale single-cell/nucleus RNA-seq data, including visualization, clustering, imputation, and disease-specific gene network identification. Xu et al. develop a graph-embedded Gaussian mixture variational autoencoder network algorithm (termed autoCell) for end-to-end analyses of single-cell/nuclei RNA-seq data, including visualization, clustering, imputation, and cell-type-specific gene network identification. autoCell offers a useful tool for large-scale single-cell genomic data analyses to accelerate translational biology and disease discoveries.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
Lead contact
ISSN:2667-2375
2667-2375
DOI:10.1016/j.crmeth.2022.100382