Partial least squares proportional hazard regression for application to DNA microarray survival data

Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard...

Full description

Saved in:
Bibliographic Details
Published inBioinformatics (Oxford, England) Vol. 18; no. 12; pp. 1625 - 1632
Main Authors NGUYEN, Danh V, ROCKE, David M
Format Journal Article
LanguageEnglish
Published Oxford Oxford University Press 01.12.2002
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays. For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study.
AbstractList MOTIVATIONMicroarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays.RESULTSFor a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study.
Abstract Motivation: Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays. Results: For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N ≪ p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study. Availability: The methodology can be implemented using a combination of standard statistical methods, available, for example, in SAS. Sample SAS macro codes to implement the methods will be available at http://stat.tamu.edu/~dnguyen/supplemental.html Contact: dnguyen@stat.tamu.edudmrocke@ucdavis.edu * To whom correspondence should be addressed.
Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays. For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study.
Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays.
Author ROCKE, David M
NGUYEN, Danh V
Author_xml – sequence: 1
  givenname: Danh V
  surname: NGUYEN
  fullname: NGUYEN, Danh V
  organization: Department of Statistics, Texas A&M University, College Station, TX 77843, United States
– sequence: 2
  givenname: David M
  surname: ROCKE
  fullname: ROCKE, David M
  organization: Department of Applied Science, University of California, Davis, CA 95616, United States
BackLink http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=14492920$$DView record in Pascal Francis
https://www.ncbi.nlm.nih.gov/pubmed/12490447$$D View this record in MEDLINE/PubMed
BookMark eNqFkctO5DAQRS3UiPcvjDyLYddN2XESe4mYFxICFrCOyo_MeJTEaTtpCb4et7oFmhXe2Kp7bpXte0oWQxgcIV8ZrBio4kr74Ic2xB4nb9IVkyvGV6zi5QE5YUVVL4VkbPF-huKYnKb0DwBKKKsjcsy4UCBEfULsI8bJY0c7h2miaT1jdImOMYwhC2HI0l98xWhpdH-ylHKN5tkUx7HzBrcMnQL9fn9Ne29iwBjxhaY5bvwmmy1OeE4OW-ySu9jvZ-T554-nm9_Lu4dftzfXd0sjeDktW6tVKywoVNrVBjQUuaQZgNSVVbrQuqxAyLycli3XVlrkutS2Nqrmsjgjl7u--frr2aWp6X0yrutwcGFOTc1rBZXin4JMVrIU5RZUOzA_LKXo2maMvsf40jBotlk0_2eRnQ3jzTaL7P2yHzLr3tkP5_7zM_BtD2Ay2LURB-PTByeE4opD8QY6RZvm
CitedBy_id crossref_primary_10_1002_sim_2834
crossref_primary_10_7465_jkdi_2014_25_5_1151
crossref_primary_10_1080_02664763_2016_1254731
crossref_primary_10_1186_s12859_020_3423_z
crossref_primary_10_1002_cfg_228
crossref_primary_10_1007_s10985_007_9076_7
crossref_primary_10_1089_omi_2009_0003
crossref_primary_10_1016_j_ccr_2009_10_018
crossref_primary_10_1593_neo_121038
crossref_primary_10_1198_sbr_2009_08091
crossref_primary_10_1371_journal_pone_0084253
crossref_primary_10_1155_2014_618412
crossref_primary_10_1038_sj_npp_1300947
crossref_primary_10_1038_s41598_019_41625_z
crossref_primary_10_1007_s10985_004_4776_8
crossref_primary_10_1111_j_1467_9469_2009_00685_x
crossref_primary_10_1186_1471_2105_7_537
crossref_primary_10_2197_ipsjtbio_9_18
crossref_primary_10_1186_1471_2105_10_72
crossref_primary_10_1109_TCBB_2020_2965934
crossref_primary_10_1016_j_spl_2010_04_011
crossref_primary_10_1111_pcn_13671
crossref_primary_10_1002_sim_3412
crossref_primary_10_1007_s11460_009_0041_y
crossref_primary_10_1016_j_chemolab_2012_04_005
crossref_primary_10_1081_STA_120017810
crossref_primary_10_1186_s40246_015_0050_2
crossref_primary_10_1186_bcr2472
crossref_primary_10_1186_1471_2105_8_60
crossref_primary_10_1002_sam_11169
crossref_primary_10_1186_1471_2105_7_203
crossref_primary_10_1186_1742_4682_4_3
crossref_primary_10_1155_2013_632030
crossref_primary_10_1016_j_postharvbio_2020_111413
crossref_primary_10_1093_bioinformatics_bth469
crossref_primary_10_1186_1479_5876_4_50
crossref_primary_10_1198_jasa_2009_tm08622
crossref_primary_10_1016_j_jbiotec_2010_11_016
crossref_primary_10_1093_bioinformatics_btu660
crossref_primary_10_1177_0962280209105024
crossref_primary_10_1093_bib_bbr001
crossref_primary_10_1016_j_aca_2012_01_062
crossref_primary_10_1007_s10985_009_9111_y
crossref_primary_10_1371_journal_pone_0213245
crossref_primary_10_1109_TEVC_2007_906660
crossref_primary_10_1186_1742_4682_2_23
crossref_primary_10_1093_bioinformatics_btq660
crossref_primary_10_1142_S0219720009004412
crossref_primary_10_1021_ac9000282
crossref_primary_10_1080_10543400802277967
crossref_primary_10_1142_S0218126608004459
crossref_primary_10_1371_journal_pcbi_1002975
crossref_primary_10_1093_bioinformatics_btq261
crossref_primary_10_1111_j_1541_0420_2005_00405_x
crossref_primary_10_1016_j_infrared_2020_103355
crossref_primary_10_1016_j_csda_2004_02_005
crossref_primary_10_1016_j_chemolab_2018_05_005
crossref_primary_10_1186_1479_5876_3_32
crossref_primary_10_1002_sam_10103
crossref_primary_10_1093_bioinformatics_btl450
crossref_primary_10_1186_bcr1173
crossref_primary_10_3233_CBM_151368
crossref_primary_10_1109_TCBB_2012_31
crossref_primary_10_1158_0008_5472_CAN_07_6595
crossref_primary_10_1186_1471_2105_7_156
crossref_primary_10_1093_bioinformatics_bti824
crossref_primary_10_1152_ajprenal_00722_2009
crossref_primary_10_1016_S1672_0229_06_60022_3
crossref_primary_10_1186_1471_2105_8_192
crossref_primary_10_1016_j_mbs_2004_10_007
crossref_primary_10_1016_j_gaceta_2020_12_017
crossref_primary_10_1186_1471_2164_10_225
crossref_primary_10_1016_j_csda_2008_05_021
crossref_primary_10_1111_biom_13208
crossref_primary_10_1142_S0219720010004914
crossref_primary_10_1016_j_cll_2007_10_010
crossref_primary_10_1038_msb_2011_17
crossref_primary_10_1016_j_jclinepi_2006_10_006
crossref_primary_10_1089_cmb_2008_12TT
crossref_primary_10_1002_pmic_200600898
crossref_primary_10_1155_2012_478680
crossref_primary_10_1093_bioinformatics_btl362
crossref_primary_10_1111_j_1541_0420_2006_00660_x
crossref_primary_10_1142_S1793536911000763
crossref_primary_10_1016_j_ejso_2006_09_002
crossref_primary_10_1080_10543406_2010_504906
crossref_primary_10_1093_bioinformatics_bti737
crossref_primary_10_1186_1471_2105_9_417
crossref_primary_10_1093_bioinformatics_btq617
ContentType Journal Article
DBID IQODW
CGR
CUY
CVF
ECM
EIF
NPM
AAYXX
CITATION
7QO
8FD
FR3
P64
7X8
DOI 10.1093/bioinformatics/18.12.1625
DatabaseName Pascal-Francis
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
CrossRef
Biotechnology Research Abstracts
Technology Research Database
Engineering Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitle MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
CrossRef
Engineering Research Database
Biotechnology Research Abstracts
Technology Research Database
Biotechnology and BioEngineering Abstracts
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
CrossRef
MEDLINE
Engineering Research Database
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
DeliveryMethod fulltext_linktorsrc
Discipline Biology
EISSN 1367-4811
EndPage 1632
ExternalDocumentID 10_1093_bioinformatics_18_12_1625
12490447
14492920
Genre Validation Studies
Research Support, U.S. Gov't, Non-P.H.S
Research Support, U.S. Gov't, P.H.S
Evaluation Studies
Journal Article
GrantInformation_xml – fundername: NIEHS NIH HHS
  grantid: P43 ES04699
– fundername: NCI NIH HHS
  grantid: CA90301
– fundername: PHS HHS
  grantid: DMS 98-70172
GroupedDBID ---
-E4
-~X
.-4
.2P
.DC
.GJ
.I3
0R~
1TH
23N
2WC
4.4
48X
53G
5GY
5WA
70D
AABJS
AABMN
AAESY
AAIJN
AAIMJ
AAIYJ
AAJKP
AAJQQ
AAKPC
AAMDB
AAMVS
AAOGV
AAPBV
AAPQZ
AAPXW
AAUQX
AAVAP
AAVLN
ABEFU
ABEUO
ABIXL
ABNKS
ABPTD
ABPTK
ABQLI
ABQTQ
ABWST
ABZBJ
ACGFS
ACIWK
ACPRK
ACUFI
ACYTK
ADBBV
ADEIU
ADEYI
ADEZT
ADFTL
ADGKP
ADGZP
ADHKW
ADHZD
ADOCK
ADORX
ADPDF
ADQLU
ADRDM
ADRIX
ADRTK
ADVEK
ADYVW
ADZTZ
ADZXQ
AECKG
AEGPL
AEJOX
AEKKA
AEKSI
AELWJ
AEMDU
AENEX
AENZO
AEPUE
AETBJ
AEWNT
AFFNX
AFFZL
AFGWE
AFIYH
AFOFC
AFRAH
AFXEN
AGINJ
AGKEF
AGQXC
AGSYK
AHMBA
AHXPO
AI.
AIJHB
AIKOY
AJEEA
AJEUX
AKHUL
AKWXX
ALMA_UNASSIGNED_HOLDINGS
ALTZX
ALUQC
APIBT
APWMN
AQDSO
ARIXL
ARQIP
ASPBG
ATTQO
AUCZF
AVWKF
AXUDD
AYOIW
AZFZN
AZQFJ
AZVOD
BAWUL
BAYMD
BCRHZ
BHONS
BQDIO
BQUQU
BSWAC
BTQHN
BYORX
C1A
C45
CAG
CASEJ
CDBKE
COF
CS3
CZ4
DAKXR
DIK
DILTD
DPORF
DPPUQ
DU5
D~K
EBD
EBS
EE~
EJD
ELUNK
EMOBN
F5P
F9B
FEDTE
FHSFR
FLIZI
FLUFQ
FOEOM
FQBLK
GAUVT
GJXCC
GROUPED_DOAJ
GX1
H13
H5~
HAR
HW0
HZ~
IOX
IQODW
J21
KAQDR
KC5
KOP
KQ8
KSI
KSN
M-Z
M49
MK~
ML0
N9A
NGC
NLBLG
NMDNZ
NOMLY
NTWIH
NU-
NVLIB
O0~
O9-
OAWHX
ODMLO
OJQWA
OK1
OVD
OVEED
O~Y
P2P
PAFKI
PB-
PEELM
PQQKQ
Q1.
Q5Y
R44
RD5
RIG
RNI
RNS
ROL
ROX
RPM
RUSNO
RW1
RXO
RZF
RZO
SV3
TEORI
TJP
TLC
TOX
TR2
VH1
W8F
WOQ
X7H
XJT
YAYTL
YKOAZ
YXANX
ZGI
ZKX
~91
~KM
ABXVV
ACMRT
CGR
CUY
CVF
ECM
EIF
HVGLF
JXSIZ
NPM
AASNB
AAYXX
CITATION
7QO
8FD
FR3
P64
7X8
ID FETCH-LOGICAL-c425t-fdb9f4d09a9be7c0b03fdbb1008b6d9b3bb56048888eb8f2bd8da2b5bd7c97283
ISSN 1367-4803
IngestDate Fri Oct 25 05:01:55 EDT 2024
Fri Oct 25 01:07:41 EDT 2024
Fri Aug 23 01:40:09 EDT 2024
Wed Oct 16 00:50:59 EDT 2024
Sun Oct 22 16:11:04 EDT 2023
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Keywords Breast disease
Complementary DNA
Transcription
Prediction
DNA chip
Malignant hemopathy
B-Lymphocyte
Models
Gene expression
Web site
Bioinformatics
Language English
License CC BY 4.0
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c425t-fdb9f4d09a9be7c0b03fdbb1008b6d9b3bb56048888eb8f2bd8da2b5bd7c97283
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
OpenAccessLink https://escholarship.org/content/qt3wq8s7z4/qt3wq8s7z4.pdf?t=ptt2bb
PMID 12490447
PQID 18685452
PQPubID 23462
PageCount 8
ParticipantIDs proquest_miscellaneous_72790692
proquest_miscellaneous_18685452
crossref_primary_10_1093_bioinformatics_18_12_1625
pubmed_primary_12490447
pascalfrancis_primary_14492920
PublicationCentury 2000
PublicationDate 2002-12-01
PublicationDateYYYYMMDD 2002-12-01
PublicationDate_xml – month: 12
  year: 2002
  text: 2002-12-01
  day: 01
PublicationDecade 2000
PublicationPlace Oxford
PublicationPlace_xml – name: Oxford
– name: England
PublicationTitle Bioinformatics (Oxford, England)
PublicationTitleAlternate Bioinformatics
PublicationYear 2002
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
SSID ssj0005056
Score 2.1488988
Snippet Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it...
Abstract Motivation: Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient...
MOTIVATIONMicroarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival...
SourceID proquest
crossref
pubmed
pascalfrancis
SourceType Aggregation Database
Index Database
StartPage 1625
SubjectTerms Biological and medical sciences
Breast Neoplasms - classification
Breast Neoplasms - genetics
Breast Neoplasms - mortality
Fundamental and applied biological sciences. Psychology
Gene Expression - genetics
Gene Expression Regulation, Neoplastic - genetics
General aspects
Humans
Least-Squares Analysis
Lymphoma, B-Cell - classification
Lymphoma, B-Cell - genetics
Lymphoma, B-Cell - mortality
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Models, Genetic
Neoplasms - genetics
Neoplasms - mortality
Oligonucleotide Array Sequence Analysis - methods
Proportional Hazards Models
Regression Analysis
Reproducibility of Results
Sensitivity and Specificity
Survival Analysis
Title Partial least squares proportional hazard regression for application to DNA microarray survival data
URI https://www.ncbi.nlm.nih.gov/pubmed/12490447
https://search.proquest.com/docview/18685452
https://search.proquest.com/docview/72790692
Volume 18
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLbKEAgJIe6UyzASb1W6NBfHfpy4aEJa4aFDfYvs2NkmQTraBNH9Bn4058RuncAmYC9RZCVHts_nz8f28TmEvDYmDcvMyICnQgeJCNNAGAnDnRkJVIhGLt53Ppyyg6PkwzydDwY_O15LTa3GxfmF90quolUoA73iLdn_0OxWKBTAO-gXnqBheP6Tjj9hGfTxF0zAM1p9a_AyEbpcoVFt9_hO5DluFCzNsXV4tX6FnWNrND7fTvdHX9EzTy6Xcj1aNcAf3_H0xl5c86e-pwsXaLUN7oyRSn9snONdNpDOzsL0uFk7UpPVyejz2B_uFNYlqHWoHx2Oe1sPUceNw7JljEHTeWgZynTLHIP-QbHN1m3aEuaE2XvPbvIF6zC6kNht0CvVayZuOnDczPVCuuG0f5vmts6H9tg9zvvC8gnPJ1GOoq6R6xHQFvLl7OPcOwyFbS7gbaNvkleuZnt9UXu-Vj1r5_aZXMHAK23GlMuXNK1pM7tL7rg1Cd23ALtHBqa6T27YLKXrB0Q7mNEWZtTBjHZhRi3MqIcZhXrSDsxovaAAM-phRjcwowizh-To_bvZm4PAJecIChi_dVBqJcpEh0IKZbIiVGEMRQpjRSmmhYqVAmMapgfOjeJlpDTXMlKp0lkhMjBqH5GdalGZJ4TGUpUsjmVRaow_qTlTBv6KJ6nMNGNmSKJNJ-ZnNgZL_lclDslur7v9n0kiMEXbkLzc9H8OlIrnZLIyiwaFMA4Li-jyL8DoFyET8MVjqzgvPQK-g0Y8vUqdn5Fbfpg9Jzv1sjEvwOat1W4LxV9-bbN6
link.rule.ids 315,783,787,27936,27937
linkProvider Oxford University Press
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Partial+least+squares+proportional+hazard+regression+for+application+to+DNA+microarray+survival+data&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Nguyen%2C+Danh+V.&rft.au=Rocke%2C+David+M.&rft.date=2002-12-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=18&rft.issue=12&rft.spage=1625&rft.epage=1632&rft_id=info:doi/10.1093%2Fbioinformatics%2F18.12.1625&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_18_12_1625
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon