Sample size planning for survival prediction with focus on high-dimensional data

Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size...

Full description

Saved in:
Bibliographic Details
Published inStatistics in medicine Vol. 32; no. 5; pp. 787 - 807
Main Authors Götte, Heiko, Zwiener, Isabella
Format Journal Article
LanguageEnglish
Published Chichester, UK John Wiley & Sons, Ltd 28.02.2013
Wiley Subscription Services, Inc
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Sample size planning should reflect the primary objective of a trial. If the primary objective is prediction, the sample size determination should focus on prediction accuracy instead of power. We present formulas for the determination of training set sample size for survival prediction. Sample size is chosen to control the difference between optimal and expected prediction error. Prediction is carried out by Cox proportional hazards models. The general approach considers censoring as well as low‐dimensional and high‐dimensional explanatory variables. For dimension reduction in the high‐dimensional setting, a variable selection step is inserted. If not all informative variables are included in the final model, the effect estimates are biased towards zero. The bias affects the prediction error, and its magnitude is influenced by the sample size. For variable selection, we consider two approaches: least absolute shrinkage and selection operator (LASCO) and univariable selection. For univariable selection, we can calculate input parameters for the sample size formula. For the LASCO, supportive simulations are necessary to appropriately choose the input parameters. We investigate the performance of the proposed formulas with the use of simulations. Simulation results support the validity of the sample size formulas. An application of a real data example illustrates the practical implementation of the method. Copyright © 2012 John Wiley & Sons, Ltd.
Bibliography:ArticleID:SIM5550
Supporting information may be found in the online version of this article.
Mainzer Forschungsförderungsprogramm (MAIFOR) (starting 01/2011) funded I. Z
istex:334080FF67D6DC55309094925AD309E7410ED36D
ark:/67375/WNG-8CFZ9CRK-K
Supporting Information
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0277-6715
1097-0258
DOI:10.1002/sim.5550