Continual learning for efficient machine learning

Deep learning has enjoyed tremendous success over the last decade, but the training of practically useful deep models remains highly inefficient both in terms of the number of weight updates and training samples. To address one aspect of these issues, this thesis studies the continual learning setti...

Full description

Saved in:
Bibliographic Details
Main Author Chaudhry, Arslan
Format Dissertation
LanguageEnglish
Published University of Oxford 2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep learning has enjoyed tremendous success over the last decade, but the training of practically useful deep models remains highly inefficient both in terms of the number of weight updates and training samples. To address one aspect of these issues, this thesis studies the continual learning setting whereby a model utilizes a sequence of tasks leveraging previous knowledge to learn new tasks quickly. The main challenge in continual learning is to keep the model from catastrophically forgetting the previous information when updating it for a new task. Towards this, firstly this thesis proposes a continual learning algorithm that preserves previous knowledge by regularizing the KL-divergence between the conditional likelihoods of two successive tasks. It is shown that this regularization imposes a quadratic penalty on the network weights based on the curvature at the minimum of the previous task. Second, this thesis presents a more efficient continual learning algorithm, utilizing an episodic memory of past tasks as a constraint such that the loss of episodic memory does not increase when a weight update is made for a new task. It is shown that using episodic memory to constrain the objective is more effective than regularizing the network parameters. Furthermore, to increase the speed of learning of new tasks, the use of compositional task descriptors using a joint embedding model is proposed that greatly improves the forward transfer. The episodic memory-based continual learning objective is then simplified by directly using the memory in the loss function. Despite its tendency to memorize the data present in the tiny episodic memory, the resulting algorithm is shown to generalize better than the one where memory is used as a constraint. An analysis is proposed that attributes this sur- prising generalization to the regularization effect brought by the data of new tasks. This algorithm is then used to learn continually from synthetic and real data. For this, a method is proposed that generates synthetic data points for each task by optimizing the forgetting loss in hindsight on the replay buffer. A nested optimization objective for continual learning is devised that effectively utilizes these synthetic points to reduce forgetting in memory-based continual learning methods. Finally, this thesis presents a continual learning algorithm that learns different tasks in nonoverlapping feature subspaces. It is shown that minimizing the overlap by keeping the subspaces of different tasks orthogonal to each other reduces the interference between the representations of these tasks.
Bibliography:Rhodes Trust ; Amazon Research Award