Sample size planning for survival prediction with focus on high-dimensional data
Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size...
Saved in:
Published in | Statistics in medicine Vol. 32; no. 5; pp. 787 - 807 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Chichester, UK
John Wiley & Sons, Ltd
28.02.2013
Wiley Subscription Services, Inc |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size is chosen to control the difference between optimal and expected prediction error. Prediction is carried out by Cox proportional hazards models. The general approach considers censoring as well as low‐dimensional and high‐dimensional explanatory variables. For dimension reduction in the high‐dimensional setting, a variable selection step is inserted. If not all informative variables are included in the final model, the effect estimates are biased towards zero. The bias affects the prediction error, and its magnitude is influenced by the sample size. For variable selection, we consider two approaches: least absolute shrinkage and selection operator (LASCO) and univariable selection. For univariable selection, we can calculate input parameters for the sample size formula. For the LASCO, supportive simulations are necessary to appropriately choose the input parameters. We investigate the performance of the proposed formulas with the use of simulations. Simulation results support the validity of the sample size formulas. An application of a real data example illustrates the practical implementation of the method. Copyright © 2012 John Wiley & Sons, Ltd. |
---|---|
Bibliography: | ArticleID:SIM5550 Supporting information may be found in the online version of this article. Mainzer Forschungsförderungsprogramm (MAIFOR) (starting 01/2011) funded I. Z istex:334080FF67D6DC55309094925AD309E7410ED36D ark:/67375/WNG-8CFZ9CRK-K Supporting Information ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0277-6715 1097-0258 |
DOI: | 10.1002/sim.5550 |