Beyond Backprop: Online Alternating Minimization with Auxiliary Variables
Despite significant recent advances in deep neural networks, training them remains a challenge due to the highly non-convex nature of the objective function. State-of-the-art methods rely on error backpropagation, which suffers from several well-known issues, such as vanishing and exploding gradient...
Saved in:
Main Authors | , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.06.2018
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.1806.09077 |
Cover
Loading…
Summary: | Despite significant recent advances in deep neural networks, training them
remains a challenge due to the highly non-convex nature of the objective
function. State-of-the-art methods rely on error backpropagation, which suffers
from several well-known issues, such as vanishing and exploding gradients,
inability to handle non-differentiable nonlinearities and to parallelize
weight-updates across layers, and biological implausibility. These limitations
continue to motivate exploration of alternative training algorithms, including
several recently proposed auxiliary-variable methods which break the complex
nested objective function into local subproblems. However, those techniques are
mainly offline (batch), which limits their applicability to extremely large
datasets, as well as to online, continual or reinforcement learning. The main
contribution of our work is a novel online (stochastic/mini-batch) alternating
minimization (AM) approach for training deep neural networks, together with the
first theoretical convergence guarantees for AM in stochastic settings and
promising empirical results on a variety of architectures and datasets. |
---|---|
DOI: | 10.48550/arxiv.1806.09077 |