On discovery of novel hub genes for ER+ and TN breast cancer types through RNA seq data analyses and classification models
Breast cancer (BC) is a malignant neoplasm which is classified into various types defined by underlying molecular factors such as estrogen receptor positive (ER+), progesterone receptor positive (PR+), human epidermal growth factor positive (HER2+) and triple negative (TNBC). Early detection of ER+ ...
Saved in:
Published in | Scientific reports Vol. 14; no. 1; pp. 20840 - 15 |
---|---|
Main Authors | , , , , , , , , , |
Format | Journal Article |
Language | English |
Published |
London
Nature Publishing Group UK
06.09.2024
Nature Publishing Group Nature Portfolio |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Breast cancer (BC) is a malignant neoplasm which is classified into various types defined by underlying molecular factors such as estrogen receptor positive (ER+), progesterone receptor positive (PR+), human epidermal growth factor positive (HER2+) and triple negative (TNBC). Early detection of ER+ and TNBC is crucial in the choice of diagnosis and appropriate treatment strategy. Here we report the key genes associated to ER+ and TNBC using RNA-Seq analysis and machine learning models. Three ER+ and TNBC RNA seq datasets comprising 164 patients
in-toto
were selected for standard NGS hierarchical data processing and data analyses protocols. Enrichment pathway analysis and network analysis was done and finally top hub genes were identified. To come with a reliable classifier which could distinguish the distinct transcriptome patterns associated to ER+ and TNBC, ML models were built employing Naïve Bayes, SVM and kNN. 1730 common DEG’s exhibiting significant logFC values with 0.05 p-value threshold were identified. A list of top ten hub genes were screened on the basis of maximal clique centrality (MCC) which included CDC20, CDK1, BUB1, AURKA, CDCA8, RRM2, TTK, CENPF, CEP55 and NDC80.These genes were found to be involved in crucial cell cycle pathways. k-Nearest Neighbor (kNN) model was observed to be best classifier with accuracy 84%, specificity 66% and sensitivity 95% to differentiate between ER+ and TNBC RNA-Seq transcriptomes. Our screened list of 10 hub genes can thus help unearth novel molecular signatures implicated in ER+ and TNBC onset, prognosis and design of novel protocols for breast cancer diagnostics and therapeutics. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2045-2322 2045-2322 |
DOI: | 10.1038/s41598-024-69721-9 |