Optimal subsampling for linear quantile regression models

Subsampling techniques are efficient methods for handling big data. Quite a few optimal sampling methods have been developed for parametric models in which the loss functions are differentiable with respect to parameters. However, they do not apply to quantile regression (QR) models as the involved...

Full description

Saved in:

Bibliographic Details
Published in	Canadian journal of statistics Vol. 49; no. 4; pp. 1039 - 1057
Main Authors	FAN, Yan, LIU, Yukun, ZHU, Lixing
Format	Journal Article
Language	English
Published	Hoboken, USA Wiley 01.12.2021 John Wiley & Sons, Inc Wiley Subscription Services, Inc
Subjects	Asymptotic methods Big Data Estimation Hansen–Hurwitz estimator linear quantile regression Normality optimal subsampling Random sampling Regression models Sampling methods Simulation uniform sampling
Online Access	Get full text
ISSN	0319-5724 1708-945X
DOI	10.1002/cjs.11590

Cover

More Information
Summary:	Subsampling techniques are efficient methods for handling big data. Quite a few optimal sampling methods have been developed for parametric models in which the loss functions are differentiable with respect to parameters. However, they do not apply to quantile regression (QR) models as the involved check function is not differentiable. To circumvent the non-differentiability problem, we consider directly estimating the linear QR coefficient by minimizing the Hansen–Hurwitz estimator of the usual loss function for QR. We establish the asymptotic normality of the resulting estimator under a generic sampling method, and then develop optimal subsampling methods for linear QR. In particular, we propose a one-stage subsampling method, which depends only on the lengths of covariates, and a two-stage subsampling method, which is a combination of the one-stage sampling and the ideal optimal subsampling methods. Our simulation and real data based simulation studies show that the two recommended sampling methods always outperform simple random sampling in terms of mean square error, whether the linear QR model is valid or not. Les techniques de sous-échantillonnage offrent une approche efficace pour gérer les mégadonnées. Bon nombre de méthodes d’échantillonnage optimales ont été développées pour les modèles paramétriques avec une fonction de perte différentiable par rapport aux paramètres. Elles ne s’appliquent toutefois pas aux modèles de régression quantile dont la fonction en crochet n’est pas différentiable. Afin de contourner ce problème, les auteurs considèrent l’estimation directe des coefficients de régression quantile linéaire en minimisant l’estimateur de Hansen-Hurwitz de la fonction de perte habituelle de la régression quantile. Les auteurs établissent la normalité asymptotique des estimateurs résultants sous une méthode d’échantillonnage générique, puis développent des méthodes optimales de sous-échantillonnage pour la régression quantile linéaire. Ils proposent notamment une méthode de sous-échantillonnage à un stade qui dépend seulement de la longueur des covariables, ainsi qu’une méthode de sous-échantillonnage à deux stades qui combine la méthode à un stade avec un sous-échantillonnage optimal idéal. Les auteurs présentent des études de simulation, certaines basées sur des données réelles, qui montrent que les deux méthodes d’échantillonnage recommandées offrent toujours de meilleures performances que l’échantillonnage aléatoire simple en termes d’erreur quadratique moyenne, que le modèle de régression quantile linéaire soit valide ou non.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0319-5724 1708-945X
DOI:	10.1002/cjs.11590