Binary hiking optimization for gene selection: Insights from HNSCC RNA-Seq data

•Introduced binary HOA for gene selection in high-dimensional biomedical data.•Enhanced binary HOA with crossover and mutation mechanisms to boost exploration.•Leveraged logit-SPLS model, surpassing SVM and RF in cancer prediction accuracy.•Identified 7-gene subset with 99 % AUC on the GSE6631 datas...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 268; p. 126404
Main Authors	Pashaei, Elnaz, Pashaei, Elham, Mirjalili, Seyedali
Format	Journal Article
Language	English
Published	Elsevier Ltd 05.04.2025
Subjects	Adaptive mutation Classification Crossover Gene selection Hiking optimization algorithm HNSCC Hiking optimization algorithm Gene selection Crossover HNSCC Adaptive mutation Classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Introduced binary HOA for gene selection in high-dimensional biomedical data.•Enhanced binary HOA with crossover and mutation mechanisms to boost exploration.•Leveraged logit-SPLS model, surpassing SVM and RF in cancer prediction accuracy.•Identified 7-gene subset with 99 % AUC on the GSE6631 dataset for HNSCC.•Identified TGFBR3, HLF, PTK7, SPINK5, CDK1, CKS1B as prognostic biomarkers in HNSCC. As a common challenge, high dimensionality in gene expression data leads to significant computational difficulties in identifying disease markers. To address this issue, this study proposes a novel hybrid gene selection method that incorporates two variants of Hiking Optimization Algorithm (HOA): the Binary HOA (BHOA) and an enhanced version, BHOA-CM. HOA simulates the adaptive behavior of hikers, adjusting their pace based on terrain slope to efficiently reach a summit. BHOA and BHOA-CM employ a hyperbolic tangent transfer function to map continuous values to binary outputs. BHOA-CM also incorporates one-point crossover and self-adaptive mutation operators to enhance the algorithm’s exploitative capabilities in gene selection. The approach begins with Differentially Expressed Genes (DEGs) analysis to identify relevant genes. BHOA and BHOA-CM are then applied with a hybrid classifier that combines Adaptive Sparse Partial Least Squares (SPLS) and Logistic Regression (logit-SPLS) to optimize gene selection performance. Experimental results on six benchmark microarray datasets demonstrate that the proposed method outperforms recent state-of-the-art techniques in classification accuracy while selecting fewer marker genes. Additionally, the method is applied to real head and neck squamous cell carcinoma (HNSCC) RNA-Seq data and online handwriting data from Alzheimer’s disease (AD) patients in the DARWIN dataset, highlighting the algorithm’s adaptability to diverse biomedical challenges. The BHOA and BHOA-CM methods demonstrate significant promise for feature selection in high-dimensional gene expression data analysis, effectively identifying key marker genes for cancer diagnosis and prognosis.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2025.126404