知の物理学研究センター / Institute for Physics of Intelligence (iπ)
“Generalization Analysis of Deep Learning: Implicit Regularization and Over-parameterization”
【概要/Abstract】Deep learning achieves high generalization performance, but a theoretical understanding of its principles is still a developing topic. In this talk, I will present two theoretical results on this topic: (i) loss surface-oriented implicit regularization, and (ii) double descent for deep models.
(i) Implicit regularization argues that a learning algorithm implicitly constrains the degrees of freedom of neural networks. However, a specific implicit regularization achieved by deep neural networks has not been clarified. In this paper, we theoretically show that when a loss surface has many local minima satisfying certain assumptions, its shape constrains a learning algorithm to achieve regularization. In this case, we also show that a generalization error of deep neural networks has an upper bound independent of the number of parameters.
(ii) Asymptotic risk analysis, including double descent, is a theoretical framework to analyze the generalization error of models with excessive parameters. Although it has attracted strong attention, it can analyze linear models in features such as random feature models. We show that, for a family of models without linearity constraints, the upper bound of the generalization error follows the theory of asymptotic risk. By investigating our regularity condition, we show that specific nonlinear models, such as parallelized deep neural networks, obey our result.
世話人：知の物理学研究センター Tilman HARTWIG, 髙橋昂, 中西健