Sample Size Guidelines for Logistic Regression from Observational Studies with Large Population: Emphasis on the Accuracy Between Statistics and Parameters Based on Real Life Clinical Data

Different study designs and population size may require different sample size for logistic regression. This study aims to propose sample size guidelines for logistic regression based on observational studies with large population. We estimated the minimum sample size required based on evaluation fro...

Full description

Saved in:
Bibliographic Details
Published inThe Malaysian journal of medical sciences Vol. 25; no. 4; pp. 122 - 130
Main Authors Bujang, Mohamad Adam, Sa'at, Nadiah, Sidik, Tg Mohd Ikhwan Tg Abu Bakar, Joo, Lim Chien
Format Journal Article
LanguageEnglish
Published Malaysia Penerbit Universiti Sains Malaysia 01.07.2018
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Different study designs and population size may require different sample size for logistic regression. This study aims to propose sample size guidelines for logistic regression based on observational studies with large population. We estimated the minimum sample size required based on evaluation from real clinical data to evaluate the accuracy between statistics derived and the actual parameters. Nagelkerke r-squared and coefficients derived were compared with their respective parameters. With a minimum sample size of 500, results showed that the differences between the sample estimates and the population was sufficiently small. Based on an audit from a medium size of population, the differences were within ± 0.5 for coefficients and ± 0.02 for Nagelkerke -squared. Meanwhile for large population, the differences are within ± 1.0 for coefficients and ± 0.02 for Nagelkerke -squared. For observational studies with large population size that involve logistic regression in the analysis, taking a minimum sample size of 500 is necessary to derive the statistics that represent the parameters. The other recommended rules of thumb are EPV of 50 and formula; = 100 + 50 where refers to number of independent variables in the final model.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1394-195X
2180-4303
DOI:10.21315/mjms2018.25.4.12