A survey of regularization strategies for deep models

The most critical concern in machine learning is how to make an algorithm that performs well both on training data and new data. No free lunch theorem implies that each specific task needs its own tailored machine learning algorithm to be designed. A set of strategies and preferences are built into...

Full description

Saved in:
Bibliographic Details
Published inThe Artificial intelligence review Vol. 53; no. 6; pp. 3947 - 3986
Main Authors Moradi, Reza, Berangi, Reza, Minaei, Behrouz
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Netherlands 01.08.2020
Springer
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The most critical concern in machine learning is how to make an algorithm that performs well both on training data and new data. No free lunch theorem implies that each specific task needs its own tailored machine learning algorithm to be designed. A set of strategies and preferences are built into learning machines to tune them for the problem at hand. These strategies and preferences, with the core concern of generalization improvement, are collectively known as regularization. In deep learning, because of a considerable number of parameters, a great many forms of regularization methods are available to the deep learning community. Developing more effective regularization strategies has been the subject of significant research efforts in recent years. However, it is difficult for developers to choose the most suitable strategy for their problem at hand, because there is no comparative study regarding the performance of different strategies. In this paper, at the first step, the most effective regularization methods and their variants are presented and analyzed in a systematic approach. At the second step, comparative research on regularization techniques is presented in which the testing errors and computational costs are evaluated in a convolutional neural network, using CIFAR-10 ( https://www.cs.toronto.edu/~kriz/cifar.html ) dataset. In the end, different regularization methods are compared in terms of accuracy of the network, the number of epochs for the network to be trained and the number of operations per input sample. Also, the results are discussed and interpreted based on the employed strategy. The experiment results showed that weight decay and data augmentation regularizations have little computational side effects so can be used in most applications. In the case of enough computational resources, Dropout family methods are rational to be used. Moreover, in the case of abundant computational resources, batch normalization family and ensemble methods are reasonable strategies to be employed.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:0269-2821
1573-7462
DOI:10.1007/s10462-019-09784-7