Divide and Conquer Local Average Regression

The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art method to overcome challenges of massive data analysis. In this p...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Chang, Xiangyu, Lin, Shaobo, Wang, Yao
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 13.03.2016
Subjects	Data analysis Regression analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art method to overcome challenges of massive data analysis. In this paper, we merge the divide and conquer strategy with local average regression methods to infer the regressive relationship of input-output pairs from a massive data set. After theoretically analyzing the pros and cons, we find that although the divide and conquer local average regression can reach the optimal learning rate, the restric- tion to the number of data blocks is a bit strong, which makes it only feasible for small number of data blocks. We then propose two variants to lessen (or remove) this restriction. Our results show that these variants can achieve the optimal learning rate with much milder restriction (or without such restriction). Extensive experimental studies are carried out to verify our theoretical assertions.
ISSN:	2331-8422