Key Gene Mining in Transcriptional Regulation for Specific Biological Processes with Small Sample Sizes Using Multi-network pipeline Transformer

Gene mining is an important topic in the field of life sciences, but traditional machine learning methods cannot consider the regulatory relationships between genes. Deep learning methods perform poorly in small sample sizes. This study proposed a deep learning method, called TransGeneSelector, that...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Huang, Kerui, Tian, Jianhong, Sun, Lei, Zeng, Li, Xie, Peng, Deng, Aihua, Mo, Ping, Zhou, Zhibo, Jiang, Ming, Wang, Yun, Jiang, Xiaocheng
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 07.08.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Gene mining is an important topic in the field of life sciences, but traditional machine learning methods cannot consider the regulatory relationships between genes. Deep learning methods perform poorly in small sample sizes. This study proposed a deep learning method, called TransGeneSelector, that can mine critical regulatory genes involved in certain life processes using a small-sample transcriptome dataset. The method combines a WGAN-GP data augmentation network, a sample filtering network, and a Transformer classifier network, which successfully classified the state (germinating or dry seeds) of Arabidopsis thaliana seed in a dataset of 79 samples, showing performance comparable to that of Random Forests. Further, through the use of SHapley Additive exPlanations method, TransGeneSelector successfully mined genes involved in seed germination. Through the construction of gene regulatory networks and the enrichment analysis of KEGG, as well as RT-qPCR quantitative analysis, it was confirmed that these genes are at a more upstream regulatory level than those Random Forests mined, and the top 11 genes that were uniquely mined by TransGeneSelector were found to be related to the KAI2 signaling pathway, which is of great regulatory importance for germination-related genes. This study provides a practical tool for life science researchers to mine key genes from transcriptome data.
ISSN:2331-8422