scDM: A deep generative method for cell surface protein prediction with diffusion model
[Display omitted] •Considering both RNA and protein expression could obtain more biological evidence.•We propose a method for predicting protein expression based on the diffusion model.•The diffusion model is the first to be used in the field of single-cell analysis.•The proposed method was validate...
Saved in:
Published in | Journal of molecular biology Vol. 436; no. 12; p. 168610 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Netherlands
Elsevier Ltd
15.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | [Display omitted]
•Considering both RNA and protein expression could obtain more biological evidence.•We propose a method for predicting protein expression based on the diffusion model.•The diffusion model is the first to be used in the field of single-cell analysis.•The proposed method was validated on three single-cell sequencing datasets.•Our results provide new directions for the identification of novel drug targets.
The executors of organismal functions are proteins, and the transition from RNA to protein is subject to post-transcriptional regulation; therefore, considering both RNA and surface protein expression simultaneously can provide additional evidence of biological processes. Cellular indexing of transcriptomes and epitopes by sequencing (CITE-Seq) technology can measure both RNA and protein expression in single cells, but these experiments are expensive and time-consuming. Due to the lack of computational tools for predicting surface proteins, we used datasets obtained with CITE-seq technology to design a deep generative prediction method based on diffusion models and to find biological discoveries through the prediction results. In our method, the scDM, which predicts protein expression values from RNA expression values of individual cells, uses a novel way of encoding the data into a model and generates predicted samples by introducing Gaussian noise to gradually remove the noise to learn the data distribution during the modelling process. Comprehensive evaluation across different datasets demonstrated that our predictions yielded satisfactory results and further demonstrated the effectiveness of incorporating information from single-cell multiomics data into diffusion models for biological studies. We also found that new directions for discovering therapeutic drug targets could be provided by jointly analysing the predictive value of surface protein expression and cancer cell drug scores. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0022-2836 1089-8638 1089-8638 |
DOI: | 10.1016/j.jmb.2024.168610 |