Binary hiking optimization for gene selection: Insights from HNSCC RNA-Seq data
•Introduced binary HOA for gene selection in high-dimensional biomedical data.•Enhanced binary HOA with crossover and mutation mechanisms to boost exploration.•Leveraged logit-SPLS model, surpassing SVM and RF in cancer prediction accuracy.•Identified 7-gene subset with 99 % AUC on the GSE6631 datas...
Saved in:
Published in | Expert systems with applications Vol. 268; p. 126404 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
05.04.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | •Introduced binary HOA for gene selection in high-dimensional biomedical data.•Enhanced binary HOA with crossover and mutation mechanisms to boost exploration.•Leveraged logit-SPLS model, surpassing SVM and RF in cancer prediction accuracy.•Identified 7-gene subset with 99 % AUC on the GSE6631 dataset for HNSCC.•Identified TGFBR3, HLF, PTK7, SPINK5, CDK1, CKS1B as prognostic biomarkers in HNSCC.
As a common challenge, high dimensionality in gene expression data leads to significant computational difficulties in identifying disease markers. To address this issue, this study proposes a novel hybrid gene selection method that incorporates two variants of Hiking Optimization Algorithm (HOA): the Binary HOA (BHOA) and an enhanced version, BHOA-CM. HOA simulates the adaptive behavior of hikers, adjusting their pace based on terrain slope to efficiently reach a summit. BHOA and BHOA-CM employ a hyperbolic tangent transfer function to map continuous values to binary outputs. BHOA-CM also incorporates one-point crossover and self-adaptive mutation operators to enhance the algorithm’s exploitative capabilities in gene selection. The approach begins with Differentially Expressed Genes (DEGs) analysis to identify relevant genes. BHOA and BHOA-CM are then applied with a hybrid classifier that combines Adaptive Sparse Partial Least Squares (SPLS) and Logistic Regression (logit-SPLS) to optimize gene selection performance. Experimental results on six benchmark microarray datasets demonstrate that the proposed method outperforms recent state-of-the-art techniques in classification accuracy while selecting fewer marker genes. Additionally, the method is applied to real head and neck squamous cell carcinoma (HNSCC) RNA-Seq data and online handwriting data from Alzheimer’s disease (AD) patients in the DARWIN dataset, highlighting the algorithm’s adaptability to diverse biomedical challenges. The BHOA and BHOA-CM methods demonstrate significant promise for feature selection in high-dimensional gene expression data analysis, effectively identifying key marker genes for cancer diagnosis and prognosis. |
---|---|
ISSN: | 0957-4174 |
DOI: | 10.1016/j.eswa.2025.126404 |