Simulated annealing aided genetic algorithm for gene selection from microarray data
In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-mar...
Saved in:
Published in | Computers in biology and medicine Vol. 158; p. 106854 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Ltd
01.05.2023
Elsevier Limited |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA.
•Application of Simulated Annealing aided Genetic Algorithm to solve the FS problem.•Introduced a new multi-objective fitness function to evaluate a feature subset.•Proposal of a new acceptance probability function in SA and enhancements in GA.•Use of initial feature dropping using MICC for microarray and omics datasets.•Clustering-based population initialization to avoid premature convergence of SAGA. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0010-4825 1879-0534 |
DOI: | 10.1016/j.compbiomed.2023.106854 |