Cancer Classification from Gene Expression Data by NPPC Ensemble

The most important application of microarray in gene expression analysis is to classify the unknown tissue samples according to their gene expression levels with the help of known sample expression levels. In this paper, we present a nonparallel plane proximal classifier (NPPC) ensemble that ensures...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on computational biology and bioinformatics Vol. 8; no. 3; pp. 659 - 671
Main Authors	Ghorai, S, Mukherjee, A, Sengupta, S, Dutta, P K
Format	Journal Article
Language	English
Published	United States IEEE 01.05.2011 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Algorithms Application software Artificial Intelligence Cancer Cancer classification Classification classifier ensemble Classifiers combination of multiple classifiers Computational Biology - methods Databases, Genetic Diagnosis Diseases Filters Gene expression Gene Expression Profiling - methods Genetic algorithms Genetics Humans microarray data analysis Mutual information Neoplasms - classification Neoplasms - genetics Neoplasms - metabolism Oligonucleotide Array Sequence Analysis - methods proximal classifier Reproducibility of Results Support vector machine classification Support vector machines Testing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The most important application of microarray in gene expression analysis is to classify the unknown tissue samples according to their gene expression levels with the help of known sample expression levels. In this paper, we present a nonparallel plane proximal classifier (NPPC) ensemble that ensures high classification accuracy of test samples in a computer-aided diagnosis (CAD) framework than that of a single NPPC model. For each data set only, a few genes are selected by using a mutual information criterion. Then a genetic algorithm-based simultaneous feature and model selection scheme is used to train a number of NPPC expert models in multiple subspaces by maximizing cross-validation accuracy. The members of the ensemble are selected by the performance of the trained models on a validation set. Besides the usual majority voting method, we have introduced minimum average proximity-based decision combiner for NPPC ensemble. The effectiveness of the NPPC ensemble and the proposed new approach of combining decisions for cancer diagnosis are studied and compared with support vector machine (SVM) classifier in a similar framework. Experimental results on cancer data sets show that the NPPC ensemble offers comparable testing accuracy to that of SVM ensemble with reduced training time on average.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1545-5963 1557-9964
DOI:	10.1109/TCBB.2010.36