Fitting the Cox proportional hazards model to big data

The semiparametric Cox proportional hazards model, together with the partial likelihood principle, has been widely used to study the effects of potentially time-dependent covariates on a possibly censored event time. We propose a computationally efficient method for fitting the Cox model to big data...

Full description

Saved in:
Bibliographic Details
Published inBiometrics Vol. 80; no. 1
Main Authors Wang, Jianqiao, Zeng, Donglin, Lin, Dan-Yu
Format Journal Article
LanguageEnglish
Published England 29.01.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The semiparametric Cox proportional hazards model, together with the partial likelihood principle, has been widely used to study the effects of potentially time-dependent covariates on a possibly censored event time. We propose a computationally efficient method for fitting the Cox model to big data involving millions of study subjects. Specifically, we perform maximum partial likelihood estimation on a small subset of the whole data and improve the initial estimator by incorporating the remaining data through one-step estimation with estimated efficient score functions. We show that the final estimator has the same asymptotic distribution as the conventional maximum partial likelihood estimator using the whole dataset but requires only a small fraction of computation time. We demonstrate the usefulness of the proposed method through extensive simulation studies and an application to the UK Biobank data.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0006-341X
1541-0420
DOI:10.1093/biomtc/ujae018