Generalization and Expressivity for Deep Nets

Along with the rapid development of deep learning in practice, theoretical explanations for its success become urgent. Generalization and expressivity are two widely used measurements to quantify theoretical behaviors of deep nets. The expressivity focuses on finding functions expressible by deep ne...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 30; no. 5; pp. 1392 - 1406
Main Author	Lin, Shao-Bo
Format	Journal Article
Language	English
Published	United States IEEE 01.05.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Approximation Biological neural networks Deep learning expressivity generalization Learning systems Learning theory localized approximation Machine learning Mathematical analysis Nets Neurons Optimization Power measurement Risk management Sparse representation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Along with the rapid development of deep learning in practice, theoretical explanations for its success become urgent. Generalization and expressivity are two widely used measurements to quantify theoretical behaviors of deep nets. The expressivity focuses on finding functions expressible by deep nets but cannot be approximated by shallow nets with similar number of neurons. It usually implies the large capacity. The generalization aims at deriving fast learning rate for deep nets. It usually requires small capacity to reduce the variance. Different from previous studies on deep nets, pursuing either expressivity or generalization, we consider both the factors to explore theoretical advantages of deep nets. For this purpose, we construct a deep net with two hidden layers possessing excellent expressivity in terms of localized and sparse approximation. Then, utilizing the well known covering number to measure the capacity, we find that deep nets possess excellent expressive power (measured by localized and sparse approximation) without essentially enlarging the capacity of shallow nets. As a consequence, we derive near-optimal learning rates for implementing empirical risk minimization on deep nets. These results theoretically exhibit advantages of deep nets from the learning theory viewpoint.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2162-237X 2162-2388 2162-2388
DOI:	10.1109/TNNLS.2018.2868980