Genetic algorithm-based feature selection with manifold learning for cancer classification using microarray data

Background Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing...

Full description

Saved in:

Bibliographic Details
Published in	BMC bioinformatics Vol. 24; no. 1; pp. 139 - 22
Main Authors	Wang, Zixuan, Zhou, Yi, Takagi, Tatsuya, Song, Jiangning, Tian, Yu-Shi, Shibuya, Tetsuo
Format	Journal Article
Language	English
Published	London BioMed Central 08.04.2023 BioMed Central Ltd BMC
Subjects	Accuracy Algorithms Analysis Bioinformatics Biomedical and Life Sciences Cancer Cancer classification Classification Computational Biology/Bioinformatics Computer Appl. in Life Sciences Data mining Datasets DNA microarrays Euclidean space Feature selection Gene expression Gene Expression Profiling - methods Gene selection Genes Genetic algorithm Genetic algorithms Genetic aspects Genetic research Genetic Techniques Health aspects Heuristic High-throughput screening (Biochemical assaying) Humans Hybrids Life Sciences Machine learning Manifold algorithm Manifolds (mathematics) Methods Microarray Analysis - methods Microarray data Microarrays Neoplasms - classification Neoplasms - genetics Optimization algorithms Performance evaluation Probability Australia Manifold algorithm Gene selection Genetic algorithm Cancer classification Microarray data
Online Access	Get full text
ISSN	1471-2105 1471-2105
DOI	10.1186/s12859-023-05267-3

Cover

Loading…

More Information
Summary:	Background Microarray data have been widely utilized for cancer classification. The main characteristic of microarray data is “large p and small n” in that data contain a small number of subjects but a large number of genes. It may affect the validity of the classification. Thus, there is a pressing demand of techniques able to select genes relevant to cancer classification. Results This study proposed a novel feature (gene) selection method, Iso-GA, for cancer classification. Iso-GA hybrids the manifold learning algorithm, Isomap, in the genetic algorithm (GA) to account for the latent nonlinear structure of the gene expression in the microarray data. The Davies–Bouldin index is adopted to evaluate the candidate solutions in Isomap and to avoid the classifier dependency problem. Additionally, a probability-based framework is introduced to reduce the possibility of genes being randomly selected by GA. The performance of Iso-GA was evaluated on eight benchmark microarray datasets of cancers. Iso-GA outperformed other benchmarking gene selection methods, leading to good classification accuracy with fewer critical genes selected. Conclusions The proposed Iso-GA method can effectively select fewer but critical genes from microarray data to achieve competitive classification performance.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1471-2105 1471-2105
DOI:	10.1186/s12859-023-05267-3