Fast Computation of Kernel Estimators

The computational complexity of evaluating the kernel density estimate (or its derivatives) at m evaluation points given n sample points scales quadratically as O(nm)-making it prohibitively expensive for large datasets. While approximate methods like binning could speed up the computation, they lac...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computational and graphical statistics Vol. 19; no. 1; pp. 205 - 220
Main Authors	Raykar, Vikas C., Duraiswami, Ramani, Zhao, Linda H.
Format	Journal Article
Language	English
Published	Alexandria Taylor & Francis 01.03.2010 JCGS Management Committee of the American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America Taylor & Francis Ltd
Subjects	Accuracy Algorithms Approximation Bandwidth estimation Binning Comparative analysis Data bandwidth Density estimation Error rates Estimate reliability Estimating techniques Estimation methods Estimators Evaluation points Fast Fourier transform Human error Kernel density derivative estimation Kernel density estimation Normal distribution Studies Wands
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The computational complexity of evaluating the kernel density estimate (or its derivatives) at m evaluation points given n sample points scales quadratically as O(nm)-making it prohibitively expensive for large datasets. While approximate methods like binning could speed up the computation, they lack a precise control over the accuracy of the approximation. There is no straightforward way of choosing the binning parameters a priori in order to achieve a desired approximation error. We propose a novel computationally efficient ε-exact approximation algorithm for the univariate Gaussian kernel-based density derivative estimation that reduces the computational complexity from O(nm) to linear O(n+m). The user can specify a desired accuracy ε. The algorithm guarantees that the actual error between the approximation and the original kernel estimate will always be less than ε. We also apply our proposed fast algorithm to speed up automatic bandwidth selection procedures. We compare our method to the best available binning methods in terms of the speed and the accuracy. Our experimental results show that the proposed method is almost twice as fast as the best binning methods and is around five orders of magnitude more accurate. The software for the proposed method is available online.
ISSN:	1061-8600 1537-2715
DOI:	10.1198/jcgs.2010.09046