Missing value estimation for DNA microarray gene expression data: local least squares imputation

Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formula...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics Vol. 21; no. 2; pp. 187 - 198
Main Authors Kim, Hyunsoo, Golub, Gene H., Park, Haesun
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 15.01.2005
Oxford Publishing Limited (England)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Motivation: Gene expression data often contain missing expression values. Effective missing value estimation methods are needed since many algorithms for gene expression data analysis require a complete matrix of gene array values. In this paper, imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process. Results: The proposed local least squares imputation method (LLSimpute) represents a target gene that has missing values as a linear combination of similar genes. The similar genes are chosen by k-nearest neighbors or k coherent genes that have large absolute values of Pearson correlation coefficients. Non-parametric missing values estimation method of LLSimpute are designed by introducing an automatic k-value estimator. In our experiments, the proposed LLSimpute method shows competitive results when compared with other imputation methods for missing value estimation on various datasets and percentages of missing values in the data. Availability: The software is available at http://www.cs.umn.edu/~hskim/tools.html Contact: hpark@cs.umn.edu
Bibliography:istex:C9F930B6E06A6C1BB8A888F4D653965F940A0D89
To whom correspondence should be addressed.
local:bth499
ark:/67375/HXZ-VNJD6S3P-M
ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:1367-4803
1460-2059
1367-4811
DOI:10.1093/bioinformatics/bth499