The p-norm generalization of the LMS algorithm for adaptive filtering

Recently much work has been done analyzing online machine learning algorithms in a worst case setting, where no probabilistic assumptions are made about the data. This is analogous to the H/sup /spl infin// setting used in adaptive linear filtering. Bregman divergences have become a standard tool fo...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on signal processing Vol. 54; no. 5; pp. 1782 - 1793
Main Authors	Kivinen, J., Warmuth, M.K., Hassibi, B.
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.05.2006 Institute of Electrical and Electronics Engineers The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptive filtering Adaptive filters Algorithm design and analysis Algorithms Applied sciences Bregman divergences Detection, estimation, filtering, equalization, prediction Exact sciences and technology Filtering algorithms Generalized linear models Information, signal and communications theory Input variables Kernel least mean squares Least squares approximation Machine learning algorithms Maximum likelihood detection online learning Signal and communications theory Signal, noise Studies Telecommunications and information theory Transfer functions Vectors H infinite optimization Adaptive algorithm Probabilistic approach Worst case method Adaptive filtering Non linear function On-line systems Linear model Kernel method least mean squares Linear system Bregman divergences H∞ optimality Linear filtering Transfer function Learning algorithm Logistics Least mean squares methods online learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recently much work has been done analyzing online machine learning algorithms in a worst case setting, where no probabilistic assumptions are made about the data. This is analogous to the H/sup /spl infin// setting used in adaptive linear filtering. Bregman divergences have become a standard tool for analyzing online machine learning algorithms. Using these divergences, we motivate a generalization of the least mean squared (LMS) algorithm. The loss bounds for these so-called p-norm algorithms involve other norms than the standard 2-norm. The bounds can be significantly better if a large proportion of the input variables are irrelevant, i.e., if the weight vector we are trying to learn is sparse. We also prove results for nonstationary targets. We only know how to apply kernel methods to the standard LMS algorithm (i.e., p=2). However, even in the general p-norm case, we can handle generalized linear models where the output of the system is a linear function combined with a nonlinear transfer function (e.g., the logistic sigmoid).
ISSN:	1053-587X 1941-0476
DOI:	10.1109/TSP.2006.872551