Nonparametric Regression via Variance-Adjusted Gradient Boosting Gaussian Process Regression

Regression models have broad applications in data analytics. Gaussian process regression is a nonparametric regression model that learns nonlinear maps from input features to real-valued output using a kernel function that constructs the covariance matrix among all pairs of data. Gaussian process re...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on knowledge and data engineering Vol. 33; no. 6; pp. 2669 - 2679
Main Authors Lu, Hsin-Min, Chen, Jih-Shin, Liao, Wei-Chun
Format Journal Article
LanguageEnglish
Published New York IEEE 01.06.2021
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Regression models have broad applications in data analytics. Gaussian process regression is a nonparametric regression model that learns nonlinear maps from input features to real-valued output using a kernel function that constructs the covariance matrix among all pairs of data. Gaussian process regression often performs well in various applications. However, the time complexity of Gaussian process regression is <inline-formula><tex-math notation="LaTeX">O(n^3)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="lu-ieq1-2953728.gif"/> </inline-formula> for a training dataset of size <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="lu-ieq2-2953728.gif"/> </inline-formula>. The cubic time complexity hinders Gaussian process regression from scaling up to large datasets. Guided by the properties of Gaussian distributions, we developed a variance-adjusted gradient boosting algorithm for approximating a Gaussian process regression (VAGR). VAGR sequentially approximates the full Gaussian process regression model using the residuals computed from variance-adjusted predictions based on randomly sampled training subsets. VAGR has a time complexity of <inline-formula><tex-math notation="LaTeX">O(nm^3)</tex-math> <mml:math><mml:mrow><mml:mi>O</mml:mi><mml:mo>(</mml:mo><mml:mi>n</mml:mi><mml:msup><mml:mi>m</mml:mi><mml:mn>3</mml:mn></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math><inline-graphic xlink:href="lu-ieq3-2953728.gif"/> </inline-formula> for a training dataset of size <inline-formula><tex-math notation="LaTeX">n</tex-math> <mml:math><mml:mi>n</mml:mi></mml:math><inline-graphic xlink:href="lu-ieq4-2953728.gif"/> </inline-formula> and the chosen batch size <inline-formula><tex-math notation="LaTeX">m</tex-math> <mml:math><mml:mi>m</mml:mi></mml:math><inline-graphic xlink:href="lu-ieq5-2953728.gif"/> </inline-formula>. The reduced time complexity allows us to apply VAGR to much larger datasets compared with the full Gaussian process regression. Our experiments suggest that VAGR has a prediction performance comparable to or better than models that include random forest, gradient boosting machines, support vector regressions, and stochastic variational inference for Gaussian process regression.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2019.2953728