Equivalence between dropout and data augmentation: A mathematical check
The great achievements of deep learning can be attributed to its tremendous power of feature representation, where the representation ability comes from the nonlinear activation function and the large number of network nodes. However, deep neural networks suffer from serious issues such as slow conv...
Saved in:
Published in | Neural networks Vol. 115; pp. 82 - 89 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
Elsevier Ltd
01.07.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The great achievements of deep learning can be attributed to its tremendous power of feature representation, where the representation ability comes from the nonlinear activation function and the large number of network nodes. However, deep neural networks suffer from serious issues such as slow convergence, and dropout is an outstanding method to improve the network’s generalization ability and test performance. Many explanations have been given for why dropout works so well, among which the equivalence between dropout and data augmentation is a newly proposed and stimulating explanation. In this article, we discuss the exact conditions for this equivalence to hold. Our main result guarantees that the equivalence relation almost surely holds if the dimension of the input space is equal to or higher than that of the output space. Furthermore, if the commonly used rectified linear unit activation function is replaced by some newly proposed activation function whose value lies in R, then our results can be extended to multilayer neural networks. For comparison, some counterexamples are given for the inequivalent case. Finally, a series of experiments on the MNIST dataset are conducted to illustrate and help understand the theoretical results. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0893-6080 1879-2782 |
DOI: | 10.1016/j.neunet.2019.03.013 |