[finished] ipi seminar 10:30-12:00, Thursday Oct. 28, 2021
知の物理学研究センター / Institute for Physics of Intelligence (ipi)
【Date】10:30-12:00 JST, Thursday, Oct. 28
【Title】"Design space of a deep neural network - its spatial evolution and robustness"
Deep neural networks have a large number of free parameters such as synaptic weights. Even with fixed architecture, an extremely large number of different machines can be generated by changing the parameters. Learning amounts to focus on a subset of them which meets constraints imposed by a set of training data. Increasing the number of the training data, the design space, the phase space of the valid machines become smaller. It may also split into clusters - glass transition - if the frustration effect of the constraints is severe. The Parisi order parameter function , developed first for spin-glasses, is useful to capture the nature of such a complex phase space.
In this talk, I discuss how one can extend the statistical mechanics approach pioneered by E. Garder for a single perceptron  based on the replica method to a multi-layered, deep perceptron network . This amounts to construct a theory in which the Parisi order function is allowed to evolve in space. Specifically, I discuss two scenarios: (1) random training data (2) teacher-student setting. In both cases, we found the magnitude of the order parameter evolves in space like in 'wetting transitions': it is larger closer to the input/output boundaries suggesting that the effect of the constraints put by the training data are stronger there. If the system is deep enough, the central region remains in the liquid phase meaning that the design space remains very large there. Furthermore, in scenario (1) we found a peculiar replica symmetry breaking (RSB) which evolves in space: the design space is clustered in a complex, hierarchical manner around the boundaries which become progressively simplified approaching the center. More recently we found the same type of spatially evolving RSB also in scenario (2) in the presence of noise in the training data. But the latter RSB disappears if the network is made deep enough so that the liquid phase survives in the center. Finally, I discuss the implications of the theoretical results on deep learning in practice.
 G. Parisi, Phys. Rev. Lett. 43, 1754 (1979). The crucial work for the Nobel prize of this year.
 E. Gardner, J. Phys. A: Math. Gen. 21, 257 (1988), E. Gardner and B. Derrida, J. Phys. A: Math. Gen. 22, 1983 (1989).
 H. Yoshino, SciPostPhysCore 2, 005 (2020).