Distributed quantile regression for longitudinal big data

Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in vo...

Full description

Saved in:

Bibliographic Details
Published in	Computational statistics Vol. 39; no. 2; pp. 751 - 779
Main Authors	Fan, Ye, Lin, Nan, Yu, Liqun
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.04.2024 Springer Nature B.V
Subjects	Air quality Algorithms Big Data Computer networks Distributed processing Economic Theory/Quantitative Economics/Mathematical Methods Mathematics and Statistics Original Paper Outdoor air quality Probability and Statistics in Computer Science Probability Theory and Stochastic Processes Public health Quantiles Regression Statistics Longitudinal analysis Big data Weighted quantile regression ADMM Distributed algorithm
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Longitudinal data, measurements taken from the same subjects over time, appear routinely in many scientific fields, such as biomedical science, public health, ecology and environmental sciences. With the rapid development of information technology, modern longitudinal data are becoming massive in volume and high dimensional, hence often require distributed analysis in real-world applications. Standard divide-and-conquer techniques do not apply directly to longitudinal big data due to within-subject dependence. In this paper, we focus on developing a distributed algorithm to support quantile regression (QR) analysis of longitudinal big data, which currently remains an open and challenging issue. We employ weighted quantile regression (WQR) to accommodate the correlation in longitudinal big data, and parallelize the WQR estimation process with a two-stage algorithm to support distributed computing. Based on weights estimated in the first stage by the Newton–Raphson algorithm, the second stage solves the WQR problem using the multi-block alternating direction method of multipliers (ADMM). Simulation studies show that, compared to traditional non-distributed algorithms, our proposed method has favorable estimation accuracy and is computationally more efficient in both non-distributed and distributed environments. Further, we also analyze an air quality data set to illustrate the practical performance of this method.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0943-4062 1613-9658
DOI:	10.1007/s00180-022-01318-0