Integration and transfer learning of single-cell transcriptomes via cFIT

Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 118; no. 10; pp. 1 - 8
Main Authors	Peng, Minshi, Li, Yue, Wamsley, Brie, Wei, Yuting, Roeder, Kathryn
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 09.03.2021
Subjects	Animals Biological Sciences Brain Datasets Developmental stages Exome Sequencing Gene sequencing Humans Integration Iterative methods Knowledge management Learning Machine Learning Mice Physical Sciences Ribonucleic acid RNA RNA-Seq Single-Cell Analysis Software Transcription Transcriptome Transfer learning transfer learning single-cell RNA-seq brain cells data integration
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Large, comprehensive collections of single-cell RNA sequencing (scRNA-seq) datasets have been generated that allow for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets or transfer knowledge from one to the other to better understand cellular identity and functions. Here, we present a simple yet surprisingly effective method named common factor integration and transfer learning (cFIT) for capturing various batch effects across experiments, technologies, subjects, and even species. The proposed method models the shared information between various datasets by a common factor space while allowing for unique distortions and shifts in genewise expression in each batch. The model parameters are learned under an iterative nonnegative matrix factorization (NMF) framework and then used for synchronized integration from across-domain assays. In addition, the model enables transferring via low-rank matrix from more informative data to allow for precise identification in data of lower quality. Compared with existing approaches, our method imposes weaker assumptions on the cell composition of each individual dataset; however, it is shown to be more reliable in preserving biological variations. We apply cFIT to multiple scRNA-seq datasets of developing brain from human and mouse, varying by technologies and developmental stages. The successful integration and transfer uncover the transcriptional resemblance across systems. The study helps establish a comprehensive landscape of brain cell-type diversity and provides insights into brain development.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Contributed by Kathryn Roeder, December 13, 2020 (sent for review November 25, 2020; reviewed by Eric Courchesne and Yun Li) Author contributions: M.P., B.W., Y.W., and K.R. designed research; M.P. and Yue Li performed research; M.P., Yue Li, Y.W., and K.R. contributed new reagents/analytic tools; M.P. and Yue Li analyzed data; and M.P., Yue Li, Y.W., and K.R. wrote the paper. Reviewers: E.C., University of California San Diego; and Yun Li, University of North Carolina at Chapel Hill.
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.2024383118