On "Some factors predict successful short-term outcomes..." Mintken PE, Cleland JA, Carpenter KJ, et al. Phys Ther. 2010;90:26-42/Author Response

The problem is that these phantom degrees of freedom usually are forgotten when computing the standard errors and P values, resulting in data that often are too good to be true.2,3 Cohen et al2 recommended that all predictor variables considered for inclusion at the beginning of a study also should...

Full description

Saved in:
Bibliographic Details
Published inPhysical therapy Vol. 90; no. 4; p. 643
Main Authors Cibulka, Michael T, Harrell, Frank E, Mintken, Paul E, Cleland, Joshua A, Carpenter, Kristin J, Bieniek, Melanie L, Keirns, Mike, Whitman, Julie M
Format Journal Article
LanguageEnglish
Published Washington Oxford University Press 01.04.2010
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The problem is that these phantom degrees of freedom usually are forgotten when computing the standard errors and P values, resulting in data that often are too good to be true.2,3 Cohen et al2 recommended that all predictor variables considered for inclusion at the beginning of a study also should be included when running the regression model, meaning not just the variables that were selected by the computer program. A common problem that often occurs later when computing regression coeffi cients is that the phantom degrees of freedom are forgotten, resulting in overfi tting of the data.2-7 Overfi tting occurs when a researcher uses too many predictor variables (46 in this study) with too few study participants.3 The recommended ratio of number of participants (n) to predictor variables (k) when running a normal (simultaneous or forced) regression usually is 15-20/1.7 When running a stepwise regression, most statistical experts recommend an n/k value of no less than 40/1.2.7 For logistic regression, which Mintken and colleagues used in their study, Peduzzi et al6 advocated an events-per-variable (EPV) value greater than 10, where the EPV value is the number of successful outcomes versus nonsuccessful outcomes divided by the number of predictor variables. Not taking these phantom degrees of freedom into account often will create a number-to-predictor (n/k) ratio that is highly infl ated, thereby making the ratio look better than it really is.3 Overfi tting can create signifi cant validity problems, in that the results of the study may be valid only for the particular sample that was used.3 Therefore, in studies where a clinical prediction rule is developed, care must be taken when applying rules from overfi tted samples to the population.
ISSN:0031-9023
1538-6724