Nonparametric Regression via Variance-Adjusted Gradient Boosting Gaussian Process Regression
Regression models have broad applications in data analytics. Gaussian process regression is a nonparametric regression model that learns nonlinear maps from input features to real-valued output using a kernel function that constructs the covariance matrix among all pairs of data. Gaussian process re...
Saved in:
Published in | IEEE transactions on knowledge and data engineering Vol. 33; no. 6; pp. 2669 - 2679 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.06.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Regression models have broad applications in data analytics. Gaussian process regression is a nonparametric regression model that learns nonlinear maps from input features to real-valued output using a kernel function that constructs the covariance matrix among all pairs of data. Gaussian process regression often performs well in various applications. However, the time complexity of Gaussian process regression is <inline-formula><tex-math notation="LaTeX">O(n^3)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="lu-ieq1-2953728.gif"/> </inline-formula> for a training dataset of size <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="lu-ieq2-2953728.gif"/> </inline-formula>. The cubic time complexity hinders Gaussian process regression from scaling up to large datasets. Guided by the properties of Gaussian distributions, we developed a variance-adjusted gradient boosting algorithm for approximating a Gaussian process regression (VAGR). VAGR sequentially approximates the full Gaussian process regression model using the residuals computed from variance-adjusted predictions based on randomly sampled training subsets. VAGR has a time complexity of <inline-formula><tex-math notation="LaTeX">O(nm^3)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:msup><mml:mi>m</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="lu-ieq3-2953728.gif"/> </inline-formula> for a training dataset of size <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="lu-ieq4-2953728.gif"/> </inline-formula> and the chosen batch size <inline-formula><tex-math notation="LaTeX">m</tex-math> <mml:math><mml:mi>m</mml:mi></mml:math><inline-graphic xlink:href="lu-ieq5-2953728.gif"/> </inline-formula>. The reduced time complexity allows us to apply VAGR to much larger datasets compared with the full Gaussian process regression. Our experiments suggest that VAGR has a prediction performance comparable to or better than models that include random forest, gradient boosting machines, support vector regressions, and stochastic variational inference for Gaussian process regression. |
---|---|
ISSN: | 1041-4347 1558-2191 |
DOI: | 10.1109/TKDE.2019.2953728 |