Partial least squares proportional hazard regression for application to DNA microarray survival data

Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 18; no. 12; pp. 1625 - 1632
Main Authors NGUYEN, Danh V, ROCKE, David M
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 01.12.2002
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays. For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:1367-4803
1367-4811
DOI:10.1093/bioinformatics/18.12.1625