Simulated annealing aided genetic algorithm for gene selection from microarray data

In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-mar...

Full description

Saved in:
Bibliographic Details
Published inComputers in biology and medicine Vol. 158; p. 106854
Main Authors Marjit, Shyam, Bhattacharyya, Trinav, Chatterjee, Bitanu, Sarkar, Ram
Format Journal Article
LanguageEnglish
Published United States Elsevier Ltd 01.05.2023
Elsevier Limited
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA. •Application of Simulated Annealing aided Genetic Algorithm to solve the FS problem.•Introduced a new multi-objective fitness function to evaluate a feature subset.•Proposal of a new acceptance probability function in SA and enhancements in GA.•Use of initial feature dropping using MICC for microarray and omics datasets.•Clustering-based population initialization to avoid premature convergence of SAGA.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0010-4825
1879-0534
DOI:10.1016/j.compbiomed.2023.106854