Scaling and Resizing Symmetry in Feedforward Networks (2306.15015v1)
Abstract: Weights initialization in deep neural networks have a strong impact on the speed of converge of the learning map. Recent studies have shown that in the case of random initializations, a chaos/order phase transition occur in the space of variances of random weights and biases. Experiments then had shown that large improvements can be made, in terms of the training speed, if a neural network is initialized on values along the critical line of such phase transition. In this contribution, we show evidence that the scaling property exhibited by physical systems at criticality, is also present in untrained feedforward networks with random weights initialization at the critical line. Additionally, we suggest an additional data-resizing symmetry, which is directly inherited from the scaling symmetry at criticality.
- Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012.
- Deep speech: Scaling up end-to-end speech recognition, 2014.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
- Language models are few-shot learners, 2020.
- Playing atari with deep reinforcement learning, 2013.
- Mastering the game of go with deep neural networks and tree search. Nature, 529:484–489, 2016.
- Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences of the United States of America, 111, 05 2014.
- Deep learning models of the retinal response to natural scenes. Advances in neural information processing systems, 29:1369–1377, 2017.
- Very deep convolutional networks for large-scale image recognition, 2015.
- Going deeper with convolutions, 2014.
- Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2):157–166, 1994.
- Understanding the difficulty of training deep feedforward neural networks. In Yee Whye Teh and Mike Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume 9 of Proceedings of Machine Learning Research, pages 249–256, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. PMLR.
- Chaos in random neural networks. Physical review letters, 61 3:259–262, 1988.
- Exponential expressivity in deep neural networks through transient chaos, 2016.
- Deep information propagation, 2017.
- Dynamical isometry and a mean field theory of CNNs: How to train 10,000-layer vanilla convolutional neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 5393–5402. PMLR, 10–15 Jul 2018.
- Maximal divergence sequential autoencoder for binary software vulnerability detection. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
- Dynamical isometry and a mean field theory of RNNs: Gating enables signal propagation in recurrent neural networks. In Jennifer Dy and Andreas Krause, editors, Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pages 873–882. PMLR, 10–15 Jul 2018.
- Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
- MNIST dataset. http://yann.lecun.com/exdb/mnist/. Accessed: 2023-06-08.
- A correspondence between random neural networks and statistical field theory, 2017.
- The loss surfaces of multilayer networks, 2015.
- The simplest model of jamming. Journal of Physics A: Mathematical and Theoretical, 49(14):145001, feb 2016.
- Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Physical Review E, 100(1), jul 2019.
- Exact phase transitions in deep learning, 2022.
- Carlos Cardona (22 papers)