Robust linear regression for high‐dimensional data: An overview

Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an unde...

Full description

Saved in:
Bibliographic Details
Published inWiley interdisciplinary reviews. Computational statistics Vol. 13; no. 4
Main Authors Filzmoser, Peter, Nordhausen, Klaus
Format Journal Article
LanguageEnglish
Published Hoboken, USA John Wiley & Sons, Inc 01.07.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an underlying model. Classical least‐squares based procedures can be affected by those outliers. In the regression context, this means that the parameter estimates are biased, with consequences on the validity of the statistical inference, on regression diagnostics, and on the prediction accuracy. Robust regression methods aim at assigning appropriate weights to observations that deviate from the model. While robust regression techniques are widely known in the low‐dimensional case, researchers and practitioners might still not be very familiar with developments in this direction for high‐dimensional data. Recently, different strategies have been proposed for robust regression in the high‐dimensional case, typically based on dimension reduction, on shrinkage, including sparsity, and on combinations of such techniques. A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates. This article is categorized under: Statistical and Graphical Methods of Data Analysis > Robust Methods Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data Statistical and Graphical Methods of Data Analysis > Dimension Reduction The big data era increases the probability of data outliers, and this leads to an urgent need of robust statistical methods, as described here for the high‐dimensional regression problem.
Bibliography:Funding information
European Information and Technology Raw Materials, Grant/Award Number: 16329
ISSN:1939-5108
1939-0068
DOI:10.1002/wics.1524