EDClust: an EM-MM hybrid method for cell clustering in multiple-subject single-cell RNA sequencing

Single-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-se...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 38; no. 10; pp. 2692 - 2699
Main Authors Wei, Xin, Li, Ziyi, Ji, Hongkai, Wu, Hao
Format Journal Article
LanguageEnglish
Published England Oxford University Press 13.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Single-cell RNA sequencing (scRNA-seq) has revolutionized biological research by enabling the measurement of transcriptomic profiles at the single-cell level. With the increasing application of scRNA-seq in larger-scale studies, the problem of appropriately clustering cells emerges when the scRNA-seq data are from multiple subjects. One challenge is the subject-specific variation; systematic heterogeneity from multiple subjects may have a significant impact on clustering accuracy. Existing methods seeking to address such effects suffer from several limitations. We develop a novel statistical method, EDClust, for multi-subject scRNA-seq cell clustering. EDClust models the sequence read counts by a mixture of Dirichlet-multinomial distributions and explicitly accounts for cell-type heterogeneity, subject heterogeneity and clustering uncertainty. An EM-MM hybrid algorithm is derived for maximizing the data likelihood and clustering the cells. We perform a series of simulation studies to evaluate the proposed method and demonstrate the outstanding performance of EDClust. Comprehensive benchmarking on four real scRNA-seq datasets with various tissue types and species demonstrates the substantial accuracy improvement of EDClust compared to existing methods. The R package is freely available at https://github.com/weix21/EDClust. Supplementary data are available at Bioinformatics online.
Bibliography:The authors wish it to be known that, in their opinion, the Xin Wei and Ziyi Li should be regarded as Joint First Authors.
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/btac168