Simulated annealing aided genetic algorithm for gene selection from microarray data

In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-mar...

Full description

Saved in:

Bibliographic Details
Published in	Computers in biology and medicine Vol. 158; p. 106854
Main Authors	Marjit, Shyam, Bhattacharyya, Trinav, Chatterjee, Bitanu, Sarkar, Ram
Format	Journal Article
Language	English
Published	United States Elsevier Ltd 01.05.2023 Elsevier Limited
Subjects	Algorithms Biomarkers Cluster Analysis Clustering Correlation coefficient Correlation coefficients Datasets DNA microarrays Exploitation Feature selection Gene expression Genes Genetic algorithm Genetic algorithms Heuristic methods Humans Microarray dataset Neoplasms - genetics Oligonucleotide Array Sequence Analysis - methods Optimization algorithm Simulated annealing Simulation Microarray dataset Feature selection Optimization algorithm Gene expression Genetic algorithm Simulated annealing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent times, microarray gene expression datasets have gained significant popularity due to their usefulness to identify different types of cancer directly through bio-markers. These datasets possess a high gene-to-sample ratio and high dimensionality, with only a few genes functioning as bio-markers. Consequently, a significant amount of data is redundant, and it is essential to filter out important genes carefully. In this paper, we propose the Simulated Annealing aided Genetic Algorithm (SAGA), a meta-heuristic approach to identify informative genes from high-dimensional datasets. SAGA utilizes a two-way mutation-based Simulated Annealing (SA) as well as Genetic Algorithm (GA) to ensure a good trade-off between exploitation and exploration of the search space, respectively. The naive version of GA often gets stuck in a local optimum and depends on the initial population, leading to premature convergence. To address this, we have blended a clustering-based population generation with SA to distribute the initial population of GA over the entire feature space. To further enhance the performance, we reduce the initial search space by a score-based filter approach called the Mutually Informed Correlation Coefficient (MICC). The proposed method is evaluated on 6 microarray and 6 omics datasets. Comparison of SAGA with contemporary algorithms has shown that SAGA performs much better than its peers. Our code is available at https://github.com/shyammarjit/SAGA. •Application of Simulated Annealing aided Genetic Algorithm to solve the FS problem.•Introduced a new multi-objective fitness function to evaluate a feature subset.•Proposal of a new acceptance probability function in SA and enhancements in GA.•Use of initial feature dropping using MICC for microarray and omics datasets.•Clustering-based population initialization to avoid premature convergence of SAGA.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0010-4825 1879-0534
DOI:	10.1016/j.compbiomed.2023.106854