Depth Selection for Deep ReLU Nets in Feature Extraction and Generalization

Deep learning is recognized to be capable of discovering deep features for representation learning and pattern recognition without requiring elegant feature engineering techniques by taking advantages of human ingenuity and prior knowledge. Thus it has triggered enormous research activities in machi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 44; no. 4; pp. 1853 - 1868
Main Authors	Han, Zhi, Yu, Siquan, Lin, Shao-Bo, Zhou, Ding-Xuan
Format	Journal Article
Language	English
Published	United States IEEE 01.04.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Artificial neural networks Cognitive tasks Data mining Deep learning Deep nets Earthquake prediction Feature extraction feature extractions generalization Humans learning theory Machine Learning Machine learning algorithms Neural Networks, Computer Optimization Pattern recognition Task analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep learning is recognized to be capable of discovering deep features for representation learning and pattern recognition without requiring elegant feature engineering techniques by taking advantages of human ingenuity and prior knowledge. Thus it has triggered enormous research activities in machine learning and pattern recognition. One of the most important challenges of deep learning is to figure out relations between a feature and the depth of deep neural networks (deep nets for short) to reflect the necessity of depth. Our purpose is to quantify this feature-depth correspondence in feature extraction and generalization. We present the adaptivity of features to depths and vice-verse via showing a depth-parameter trade-off in extracting both single feature and composite features. Based on these results, we prove that implementing the classical empirical risk minimization on deep nets can achieve the optimal generalization performance for numerous learning tasks. Our theoretical results are verified by a series of numerical experiments including toy simulations and a real application of earthquake seismic intensity prediction.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	0162-8828 1939-3539 2160-9292 1939-3539
DOI:	10.1109/TPAMI.2020.3032422