Ensemble gene selection by grouping for microarray data classification

Selecting relevant and discriminative genes for sample classification is a common and critical task in gene expression analysis (e.g. disease diagnostic). It is desirable that gene selection can improve classification performance of learning algorithm effectively. In general, for most gene selection...

Full description

Saved in:

Bibliographic Details
Published in	Journal of biomedical informatics Vol. 43; no. 1; pp. 81 - 87
Main Authors	Liu, Huawen, Liu, Lei, Zhang, Huijie
Format	Journal Article
Language	English
Published	United States Elsevier Inc 01.02.2010
Subjects	Algorithms Artificial Intelligence Automatic Data Processing Cell Line, Tumor Classification Computational Biology - methods Computer Simulation Databases, Genetic - classification Ensemble learning Gene Expression Profiling - methods Gene Expression Regulation, Neoplastic Gene selection Humans Information metric Markov blanket Markov Chains Microarray analysis Models, Statistical Oligonucleotide Array Sequence Analysis - methods Pattern Recognition, Automated - classification Pattern Recognition, Automated - methods Reproducibility of Results Markov blanket Gene selection Information metric Ensemble learning Classification Microarray analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Selecting relevant and discriminative genes for sample classification is a common and critical task in gene expression analysis (e.g. disease diagnostic). It is desirable that gene selection can improve classification performance of learning algorithm effectively. In general, for most gene selection methods widely used in reality, an individual gene subset will be chosen according to its discriminative power. One of deficiencies of individual gene subset is that its contribution to classification purpose is limited. This issue can be alleviated by ensemble gene selection based on random selection to some extend. However, the random one requires an unnecessary large number of candidate gene subsets and its reliability is a problem. In this study, we propose a new ensemble method, called ensemble gene selection by grouping (EGSG), to select multiple gene subsets for the classification purpose. Rather than selecting randomly, our method chooses salient gene subsets from microarray data by virtue of information theory and approximate Markov blanket. The effectiveness and accuracy of our method is validated by experiments on five publicly available microarray data sets. The experimental results show that our ensemble gene selection method has comparable classification performance to other gene selection methods, and is more stable than the random one.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1532-0464 1532-0480
DOI:	10.1016/j.jbi.2009.08.010