Distributed quantile regression for longitudinal big data
Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in vo...
Saved in:
Published in | Computational statistics Vol. 39; no. 2; pp. 751 - 779 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.04.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0943-4062 1613-9658 |
DOI: | 10.1007/s00180-022-01318-0 |