Improved swarm-optimization-based filter-wrapper gene selection from microarray data for gene expression tumor classification

A typical microarray dataset usually contains thousands of genes, but only a small number of samples. It is in fact that most genes in a DNA microarray dataset are not relevant for classification. Identifying highly discriminating genes, known as biomarkers, is a challenging task for machine learnin...

Full description

Saved in:
Bibliographic Details
Published inPattern analysis and applications : PAA Vol. 26; no. 2; pp. 455 - 472
Main Authors Ke, Lin, Li, Min, Wang, Lei, Deng, Shaobo, Ye, Jun, Yu, Xiang
Format Journal Article
LanguageEnglish
Published London Springer London 01.05.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A typical microarray dataset usually contains thousands of genes, but only a small number of samples. It is in fact that most genes in a DNA microarray dataset are not relevant for classification. Identifying highly discriminating genes, known as biomarkers, is a challenging task for machine learning-based tumor classification. This study focuses on swarm-optimization-based filter-wrapper gene selection. In general, this type of hybrid gene selection consists of two steps: The first step is the filter step, which selects a small top-n percentage of genes and obtains reduced data; then, the second step searches for the optimal gene subset based on a wrapper model from the remaining genes by using a swarm-optimization-based algorithm. However, the second step of the existing swarm-optimization-based filter-wrapper gene selection is to search only from the remaining genes without using the ranking information of the remaining genes. This new study attempts to fill the gap that has been neglected in the area of swarm-optimization-based filter-wrapper gene selection. In this study, population initialization based on ranking criteria (PIRC) is proposed to transform the population initialization of genetic algorithm (GA) and ant colony optimization (ACO), which are called PIRCGA and PIRCACO, respectively. The experiment was carried out on 17 microarray expression datasets, and the two groups of IG-GA vs. IG-PIRCGA and IG-ACO vs. IG-PIRCACO were compared, respectively. The experimental results prove the efficiency of our proposed methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1433-7541
1433-755X
DOI:10.1007/s10044-022-01117-9