Fast cross-validation for multi-penalty ridge regression
High-dimensional prediction with multiple data types needs to account for potentially strong differences in predictive signal. Ridge regression is a simple model for high-dimensional data that has challenged the predictive performance of many more complex models and learners, and that allows inclusi...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
19.05.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | High-dimensional prediction with multiple data types needs to account for
potentially strong differences in predictive signal. Ridge regression is a
simple model for high-dimensional data that has challenged the predictive
performance of many more complex models and learners, and that allows inclusion
of data type specific penalties. The largest challenge for multi-penalty ridge
is to optimize these penalties efficiently in a cross-validation (CV) setting,
in particular for GLM and Cox ridge regression, which require an additional
estimation loop by iterative weighted least squares (IWLS). Our main
contribution is a computationally very efficient formula for the multi-penalty,
sample-weighted hat-matrix, as used in the IWLS algorithm. As a result, nearly
all computations are in low-dimensional space, rendering a speed-up of several
orders of magnitude. We developed a flexible framework that facilitates
multiple types of response, unpenalized covariates, several performance
criteria and repeated CV. Extensions to paired and preferential data types are
included and illustrated on several cancer genomics survival prediction
problems. Moreover, we present similar computational shortcuts for maximum
marginal likelihood and Bayesian probit regression. The corresponding
R-package, multiridge, serves as a versatile standalone tool, but also as a
fast benchmark for other more complex models and multi-view learners. |
---|---|
DOI: | 10.48550/arxiv.2005.09301 |