Biomarker discovery and visualization in gene expression data with efficient generalized matrix approximations

In most real-world gene expression data sets, there are often multiple sample classes with ordinals, which are categorized into the normal or diseased type. The traditional feature or attribute selection methods consider multiple classes equally without paying attention to the up/down regulation acr...

Full description

Saved in:
Bibliographic Details
Published inJournal of bioinformatics and computational biology Vol. 5; no. 2a; p. 251
Main Authors Li, Wenyuan, Peng, Yanxiong, Huang, Hung-Chung, Liu, Ying
Format Journal Article
LanguageEnglish
Published Singapore 01.04.2007
Subjects
Online AccessGet more information

Cover

Loading…
More Information
Summary:In most real-world gene expression data sets, there are often multiple sample classes with ordinals, which are categorized into the normal or diseased type. The traditional feature or attribute selection methods consider multiple classes equally without paying attention to the up/down regulation across the normal and diseased types of classes, while the specific gene selection methods particularly consider the differential expressions across the normal and diseased, but ignore the existence of multiple classes. In this paper, to improve the biomarker discovery, we propose to make the best use of these two aspects: the differential expressions (that can be viewed as the domain knowledge of gene expression data) and the multiple classes (that can be viewed as a kind of data set characteristic). Therefore, we simultaneously take into account these two aspects by employing the 1-rank generalized matrix approximations (GMA). Our results show that GMA cannot only improve the accuracy of classifying the samples, but also provide a visualization method to effectively analyze the gene expression data on both genes and samples. Based on the mechanism of matrix approximation, we further propose an algorithm, CBiomarker, to discover compact biomarker by reducing the redundancy.
ISSN:0219-7200
DOI:10.1142/S0219720007002746