Partial least squares proportional hazard regression for application to DNA microarray survival data
Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard...
Saved in:
Published in | Bioinformatics (Oxford, England) Vol. 18; no. 12; pp. 1625 - 1632 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Oxford
Oxford University Press
01.12.2002
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays.
For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study. |
---|---|
AbstractList | MOTIVATIONMicroarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays.RESULTSFor a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study. Abstract Motivation: Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays. Results: For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N ≪ p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study. Availability: The methodology can be implemented using a combination of standard statistical methods, available, for example, in SAS. Sample SAS macro codes to implement the methods will be available at http://stat.tamu.edu/~dnguyen/supplemental.html Contact: dnguyen@stat.tamu.edudmrocke@ucdavis.edu * To whom correspondence should be addressed. Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays. For a given vector of response values which are survival times and p gene expressions (covariates) we examine the problem of how to predict the survival probabilities, when N << p. The approach taken to cope with the high dimensionality is to reduce the dimension using partial least squares with the response variable as the vector of survival times. After dimension reduction, the extracted PLS gene components are then used as covariates in a PH regression to predict the survival probabilities. We demonstrate the use of the methodology on two cDNA gene expression data sets, both containing survival data. The first data set contains 40 diffuse large B-cell lymphoma (DLBCL) tissue samples and the second data set contains 49 tissue samples from patients with locally advanced breast cancer in a prospective study. Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it is often of interest to predict the survival times based on the gene expression. In this paper we consider the well-known proportional hazard (PH) regression model for survival analysis. Ordinarily, the PH model is used with a few covariates and many observations (subjects). We consider here the case that the number of covariates, p, exceeds the number of samples, N, a setting typical of gene expression data from DNA microarrays. |
Author | ROCKE, David M NGUYEN, Danh V |
Author_xml | – sequence: 1 givenname: Danh V surname: NGUYEN fullname: NGUYEN, Danh V organization: Department of Statistics, Texas A&M University, College Station, TX 77843, United States – sequence: 2 givenname: David M surname: ROCKE fullname: ROCKE, David M organization: Department of Applied Science, University of California, Davis, CA 95616, United States |
BackLink | http://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=14492920$$DView record in Pascal Francis https://www.ncbi.nlm.nih.gov/pubmed/12490447$$D View this record in MEDLINE/PubMed |
BookMark | eNqFkctO5DAQRS3UiPcvjDyLYddN2XESe4mYFxICFrCOyo_MeJTEaTtpCb4et7oFmhXe2Kp7bpXte0oWQxgcIV8ZrBio4kr74Ic2xB4nb9IVkyvGV6zi5QE5YUVVL4VkbPF-huKYnKb0DwBKKKsjcsy4UCBEfULsI8bJY0c7h2miaT1jdImOMYwhC2HI0l98xWhpdH-ylHKN5tkUx7HzBrcMnQL9fn9Ne29iwBjxhaY5bvwmmy1OeE4OW-ySu9jvZ-T554-nm9_Lu4dftzfXd0sjeDktW6tVKywoVNrVBjQUuaQZgNSVVbrQuqxAyLycli3XVlrkutS2Nqrmsjgjl7u--frr2aWp6X0yrutwcGFOTc1rBZXin4JMVrIU5RZUOzA_LKXo2maMvsf40jBotlk0_2eRnQ3jzTaL7P2yHzLr3tkP5_7zM_BtD2Ay2LURB-PTByeE4opD8QY6RZvm |
CitedBy_id | crossref_primary_10_1002_sim_2834 crossref_primary_10_7465_jkdi_2014_25_5_1151 crossref_primary_10_1080_02664763_2016_1254731 crossref_primary_10_1186_s12859_020_3423_z crossref_primary_10_1002_cfg_228 crossref_primary_10_1007_s10985_007_9076_7 crossref_primary_10_1089_omi_2009_0003 crossref_primary_10_1016_j_ccr_2009_10_018 crossref_primary_10_1593_neo_121038 crossref_primary_10_1198_sbr_2009_08091 crossref_primary_10_1371_journal_pone_0084253 crossref_primary_10_1155_2014_618412 crossref_primary_10_1038_sj_npp_1300947 crossref_primary_10_1038_s41598_019_41625_z crossref_primary_10_1007_s10985_004_4776_8 crossref_primary_10_1111_j_1467_9469_2009_00685_x crossref_primary_10_1186_1471_2105_7_537 crossref_primary_10_2197_ipsjtbio_9_18 crossref_primary_10_1186_1471_2105_10_72 crossref_primary_10_1109_TCBB_2020_2965934 crossref_primary_10_1016_j_spl_2010_04_011 crossref_primary_10_1111_pcn_13671 crossref_primary_10_1002_sim_3412 crossref_primary_10_1007_s11460_009_0041_y crossref_primary_10_1016_j_chemolab_2012_04_005 crossref_primary_10_1081_STA_120017810 crossref_primary_10_1186_s40246_015_0050_2 crossref_primary_10_1186_bcr2472 crossref_primary_10_1186_1471_2105_8_60 crossref_primary_10_1002_sam_11169 crossref_primary_10_1186_1471_2105_7_203 crossref_primary_10_1186_1742_4682_4_3 crossref_primary_10_1155_2013_632030 crossref_primary_10_1016_j_postharvbio_2020_111413 crossref_primary_10_1093_bioinformatics_bth469 crossref_primary_10_1186_1479_5876_4_50 crossref_primary_10_1198_jasa_2009_tm08622 crossref_primary_10_1016_j_jbiotec_2010_11_016 crossref_primary_10_1093_bioinformatics_btu660 crossref_primary_10_1177_0962280209105024 crossref_primary_10_1093_bib_bbr001 crossref_primary_10_1016_j_aca_2012_01_062 crossref_primary_10_1007_s10985_009_9111_y crossref_primary_10_1371_journal_pone_0213245 crossref_primary_10_1109_TEVC_2007_906660 crossref_primary_10_1186_1742_4682_2_23 crossref_primary_10_1093_bioinformatics_btq660 crossref_primary_10_1142_S0219720009004412 crossref_primary_10_1021_ac9000282 crossref_primary_10_1080_10543400802277967 crossref_primary_10_1142_S0218126608004459 crossref_primary_10_1371_journal_pcbi_1002975 crossref_primary_10_1093_bioinformatics_btq261 crossref_primary_10_1111_j_1541_0420_2005_00405_x crossref_primary_10_1016_j_infrared_2020_103355 crossref_primary_10_1016_j_csda_2004_02_005 crossref_primary_10_1016_j_chemolab_2018_05_005 crossref_primary_10_1186_1479_5876_3_32 crossref_primary_10_1002_sam_10103 crossref_primary_10_1093_bioinformatics_btl450 crossref_primary_10_1186_bcr1173 crossref_primary_10_3233_CBM_151368 crossref_primary_10_1109_TCBB_2012_31 crossref_primary_10_1158_0008_5472_CAN_07_6595 crossref_primary_10_1186_1471_2105_7_156 crossref_primary_10_1093_bioinformatics_bti824 crossref_primary_10_1152_ajprenal_00722_2009 crossref_primary_10_1016_S1672_0229_06_60022_3 crossref_primary_10_1186_1471_2105_8_192 crossref_primary_10_1016_j_mbs_2004_10_007 crossref_primary_10_1016_j_gaceta_2020_12_017 crossref_primary_10_1186_1471_2164_10_225 crossref_primary_10_1016_j_csda_2008_05_021 crossref_primary_10_1111_biom_13208 crossref_primary_10_1142_S0219720010004914 crossref_primary_10_1016_j_cll_2007_10_010 crossref_primary_10_1038_msb_2011_17 crossref_primary_10_1016_j_jclinepi_2006_10_006 crossref_primary_10_1089_cmb_2008_12TT crossref_primary_10_1002_pmic_200600898 crossref_primary_10_1155_2012_478680 crossref_primary_10_1093_bioinformatics_btl362 crossref_primary_10_1111_j_1541_0420_2006_00660_x crossref_primary_10_1142_S1793536911000763 crossref_primary_10_1016_j_ejso_2006_09_002 crossref_primary_10_1080_10543406_2010_504906 crossref_primary_10_1093_bioinformatics_bti737 crossref_primary_10_1186_1471_2105_9_417 crossref_primary_10_1093_bioinformatics_btq617 |
ContentType | Journal Article |
DBID | IQODW CGR CUY CVF ECM EIF NPM AAYXX CITATION 7QO 8FD FR3 P64 7X8 |
DOI | 10.1093/bioinformatics/18.12.1625 |
DatabaseName | Pascal-Francis Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed CrossRef Biotechnology Research Abstracts Technology Research Database Engineering Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) CrossRef Engineering Research Database Biotechnology Research Abstracts Technology Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic CrossRef MEDLINE Engineering Research Database |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 1367-4811 |
EndPage | 1632 |
ExternalDocumentID | 10_1093_bioinformatics_18_12_1625 12490447 14492920 |
Genre | Validation Studies Research Support, U.S. Gov't, Non-P.H.S Research Support, U.S. Gov't, P.H.S Evaluation Studies Journal Article |
GrantInformation_xml | – fundername: NIEHS NIH HHS grantid: P43 ES04699 – fundername: NCI NIH HHS grantid: CA90301 – fundername: PHS HHS grantid: DMS 98-70172 |
GroupedDBID | --- -E4 -~X .-4 .2P .DC .GJ .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5WA 70D AABJS AABMN AAESY AAIJN AAIMJ AAIYJ AAJKP AAJQQ AAKPC AAMDB AAMVS AAOGV AAPBV AAPQZ AAPXW AAUQX AAVAP AAVLN ABEFU ABEUO ABIXL ABNKS ABPTD ABPTK ABQLI ABQTQ ABWST ABZBJ ACGFS ACIWK ACPRK ACUFI ACYTK ADBBV ADEIU ADEYI ADEZT ADFTL ADGKP ADGZP ADHKW ADHZD ADOCK ADORX ADPDF ADQLU ADRDM ADRIX ADRTK ADVEK ADYVW ADZTZ ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AELWJ AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFNX AFFZL AFGWE AFIYH AFOFC AFRAH AFXEN AGINJ AGKEF AGQXC AGSYK AHMBA AHXPO AI. AIJHB AIKOY AJEEA AJEUX AKHUL AKWXX ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC APIBT APWMN AQDSO ARIXL ARQIP ASPBG ATTQO AUCZF AVWKF AXUDD AYOIW AZFZN AZQFJ AZVOD BAWUL BAYMD BCRHZ BHONS BQDIO BQUQU BSWAC BTQHN BYORX C1A C45 CAG CASEJ CDBKE COF CS3 CZ4 DAKXR DIK DILTD DPORF DPPUQ DU5 D~K EBD EBS EE~ EJD ELUNK EMOBN F5P F9B FEDTE FHSFR FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC GROUPED_DOAJ GX1 H13 H5~ HAR HW0 HZ~ IOX IQODW J21 KAQDR KC5 KOP KQ8 KSI KSN M-Z M49 MK~ ML0 N9A NGC NLBLG NMDNZ NOMLY NTWIH NU- NVLIB O0~ O9- OAWHX ODMLO OJQWA OK1 OVD OVEED O~Y P2P PAFKI PB- PEELM PQQKQ Q1. Q5Y R44 RD5 RIG RNI RNS ROL ROX RPM RUSNO RW1 RXO RZF RZO SV3 TEORI TJP TLC TOX TR2 VH1 W8F WOQ X7H XJT YAYTL YKOAZ YXANX ZGI ZKX ~91 ~KM ABXVV ACMRT CGR CUY CVF ECM EIF HVGLF JXSIZ NPM AASNB AAYXX CITATION 7QO 8FD FR3 P64 7X8 |
ID | FETCH-LOGICAL-c425t-fdb9f4d09a9be7c0b03fdbb1008b6d9b3bb56048888eb8f2bd8da2b5bd7c97283 |
ISSN | 1367-4803 |
IngestDate | Fri Oct 25 05:01:55 EDT 2024 Fri Oct 25 01:07:41 EDT 2024 Fri Aug 23 01:40:09 EDT 2024 Wed Oct 16 00:50:59 EDT 2024 Sun Oct 22 16:11:04 EDT 2023 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 12 |
Keywords | Breast disease Complementary DNA Transcription Prediction DNA chip Malignant hemopathy B-Lymphocyte Models Gene expression Web site Bioinformatics |
Language | English |
License | CC BY 4.0 |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c425t-fdb9f4d09a9be7c0b03fdbb1008b6d9b3bb56048888eb8f2bd8da2b5bd7c97283 |
Notes | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 |
OpenAccessLink | https://escholarship.org/content/qt3wq8s7z4/qt3wq8s7z4.pdf?t=ptt2bb |
PMID | 12490447 |
PQID | 18685452 |
PQPubID | 23462 |
PageCount | 8 |
ParticipantIDs | proquest_miscellaneous_72790692 proquest_miscellaneous_18685452 crossref_primary_10_1093_bioinformatics_18_12_1625 pubmed_primary_12490447 pascalfrancis_primary_14492920 |
PublicationCentury | 2000 |
PublicationDate | 2002-12-01 |
PublicationDateYYYYMMDD | 2002-12-01 |
PublicationDate_xml | – month: 12 year: 2002 text: 2002-12-01 day: 01 |
PublicationDecade | 2000 |
PublicationPlace | Oxford |
PublicationPlace_xml | – name: Oxford – name: England |
PublicationTitle | Bioinformatics (Oxford, England) |
PublicationTitleAlternate | Bioinformatics |
PublicationYear | 2002 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
SSID | ssj0005056 |
Score | 2.1488988 |
Snippet | Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival information, it... Abstract Motivation: Microarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient... MOTIVATIONMicroarrays are increasingly used in cancer research. When gene transcription data from microarray experiments also contains patient survival... |
SourceID | proquest crossref pubmed pascalfrancis |
SourceType | Aggregation Database Index Database |
StartPage | 1625 |
SubjectTerms | Biological and medical sciences Breast Neoplasms - classification Breast Neoplasms - genetics Breast Neoplasms - mortality Fundamental and applied biological sciences. Psychology Gene Expression - genetics Gene Expression Regulation, Neoplastic - genetics General aspects Humans Least-Squares Analysis Lymphoma, B-Cell - classification Lymphoma, B-Cell - genetics Lymphoma, B-Cell - mortality Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Models, Genetic Neoplasms - genetics Neoplasms - mortality Oligonucleotide Array Sequence Analysis - methods Proportional Hazards Models Regression Analysis Reproducibility of Results Sensitivity and Specificity Survival Analysis |
Title | Partial least squares proportional hazard regression for application to DNA microarray survival data |
URI | https://www.ncbi.nlm.nih.gov/pubmed/12490447 https://search.proquest.com/docview/18685452 https://search.proquest.com/docview/72790692 |
Volume | 18 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bb9MwFLbKEAgJIe6UyzASb1W6NBfHfpy4aEJa4aFDfYvs2NkmQTraBNH9Bn4058RuncAmYC9RZCVHts_nz8f28TmEvDYmDcvMyICnQgeJCNNAGAnDnRkJVIhGLt53Ppyyg6PkwzydDwY_O15LTa3GxfmF90quolUoA73iLdn_0OxWKBTAO-gXnqBheP6Tjj9hGfTxF0zAM1p9a_AyEbpcoVFt9_hO5DluFCzNsXV4tX6FnWNrND7fTvdHX9EzTy6Xcj1aNcAf3_H0xl5c86e-pwsXaLUN7oyRSn9snONdNpDOzsL0uFk7UpPVyejz2B_uFNYlqHWoHx2Oe1sPUceNw7JljEHTeWgZynTLHIP-QbHN1m3aEuaE2XvPbvIF6zC6kNht0CvVayZuOnDczPVCuuG0f5vmts6H9tg9zvvC8gnPJ1GOoq6R6xHQFvLl7OPcOwyFbS7gbaNvkleuZnt9UXu-Vj1r5_aZXMHAK23GlMuXNK1pM7tL7rg1Cd23ALtHBqa6T27YLKXrB0Q7mNEWZtTBjHZhRi3MqIcZhXrSDsxovaAAM-phRjcwowizh-To_bvZm4PAJecIChi_dVBqJcpEh0IKZbIiVGEMRQpjRSmmhYqVAmMapgfOjeJlpDTXMlKp0lkhMjBqH5GdalGZJ4TGUpUsjmVRaow_qTlTBv6KJ6nMNGNmSKJNJ-ZnNgZL_lclDslur7v9n0kiMEXbkLzc9H8OlIrnZLIyiwaFMA4Li-jyL8DoFyET8MVjqzgvPQK-g0Y8vUqdn5Fbfpg9Jzv1sjEvwOat1W4LxV9-bbN6 |
link.rule.ids | 315,783,787,27936,27937 |
linkProvider | Oxford University Press |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Partial+least+squares+proportional+hazard+regression+for+application+to+DNA+microarray+survival+data&rft.jtitle=Bioinformatics+%28Oxford%2C+England%29&rft.au=Nguyen%2C+Danh+V.&rft.au=Rocke%2C+David+M.&rft.date=2002-12-01&rft.issn=1367-4803&rft.eissn=1367-4811&rft.volume=18&rft.issue=12&rft.spage=1625&rft.epage=1632&rft_id=info:doi/10.1093%2Fbioinformatics%2F18.12.1625&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_bioinformatics_18_12_1625 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1367-4803&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1367-4803&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1367-4803&client=summon |