Feature Selection for Genomic Signal Processing: Unsupervised, Supervised, and Self-Supervised Scenarios

An effective data mining system lies in the representation of pattern vectors. For many bioinformatic applications, data are represented as vectors of extremely high dimension. This motivates the research on feature selection. In the literature, there are plenty of reports on feature selection metho...

Full description

Saved in:
Bibliographic Details
Published inJournal of signal processing systems Vol. 61; no. 1; pp. 3 - 20
Main Authors Kung, S. Y., Luo, Yuhui, Mak, Man-Wai
Format Journal Article
LanguageEnglish
Published Boston Springer US 01.10.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:An effective data mining system lies in the representation of pattern vectors. For many bioinformatic applications, data are represented as vectors of extremely high dimension. This motivates the research on feature selection. In the literature, there are plenty of reports on feature selection methods. In terms of training data types, they are divided into the unsupervised and supervised categories. In terms of selection methods, they fall into filter and wrapper categories. This paper will provide a brief overview on the state-of-the-arts feature selection methods on all these categories. Sample applications of these methods for genomic signal processing will be highlighted. This paper also describes a notion of self-supervision. A special method called vector index adaptive SVM (VIA-SVM) is described for selecting features under the self-supervision scenario. Furthermore, the paper makes use of a more powerful symmetric doubly supervised formulation, for which VIA-SVM is particularly useful. Based on several subcellular localization experiments, and microarray time course experiments, the VIA-SVM algorithm when combined with some filter-type metrics appears to deliver a substantial dimension reduction (one-order of magnitude) with only little degradation on accuracy.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ISSN:1939-8018
1939-8115
DOI:10.1007/s11265-008-0273-8