InRank: Incremental Low-Rank Learning (2306.11250v2)
Abstract: The theory of greedy low-rank learning (GLRL) aims to explain the impressive generalization capabilities of deep learning. It proves that stochastic gradient-based training implicitly regularizes neural networks towards low-rank solutions through a gradual increase of the rank during training. However, there is a gap between theory and practice since GLRL requires an infinitesimal initialization of the weights, which is not practical due to the fact that it is a saddle point. In this work, we remove the assumption of infinitesimal initialization by focusing on cumulative weight updates. We prove the cumulative weight updates follow an incremental low-rank trajectory for arbitrary orthogonal initialization of weights in a three-layer linear network. Empirically, we demonstrate that our theory holds on a broad range of neural networks (e.g., transformers) and standard training algorithms (e.g., SGD, Adam). However, existing training algorithms do not exploit the low-rank property to improve computational efficiency as the networks are not parameterized in low-rank. To remedy this, we design a new training algorithm Incremental Low-Rank Learning (InRank), which explicitly expresses cumulative weight updates as low-rank matrices while incrementally augmenting their ranks during training. We evaluate InRank on GPT-2, and our results indicate that InRank achieves comparable prediction performance as the full-rank counterpart while requiring at most 33% of the total ranks throughout training. We also propose an efficient version of InRank that achieves a reduction of 37% in total training time and 36% in model size when training GPT-medium on WikiText-103 from scratch.
- Implicit Regularization in Deep Matrix Factorization. arXiv:1905.13655 [cs, stat], October 2019. URL http://arxiv.org/abs/1905.13655. arXiv: 1905.13655.
- Towards Resolving the Implicit Bias of Gradient Descent for Matrix Factorization: Greedy Low-Rank Learning. arXiv:2012.09839 [cs, stat], April 2021. URL http://arxiv.org/abs/2012.09839. arXiv: 2012.09839.
- Saddle-to-Saddle Dynamics in Deep Linear Networks: Small Initialization Training, Symmetry, and Sparsity. arXiv:2106.15933 [cs, stat], January 2022. URL http://arxiv.org/abs/2106.15933. arXiv: 2106.15933.
- Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120 [cond-mat, q-bio, stat], February 2014. URL http://arxiv.org/abs/1312.6120. arXiv: 1312.6120.
- Characterizing Implicit Bias in Terms of Optimization Geometry. In Proceedings of the 35th International Conference on Machine Learning, pages 1832–1841. PMLR, July 2018. URL https://proceedings.mlr.press/v80/gunasekar18a.html. ISSN: 2640-3498.
- On the Spectral Bias of Neural Networks. Technical Report arXiv:1806.08734, arXiv, May 2019. URL http://arxiv.org/abs/1806.08734. arXiv:1806.08734 [cs, stat] type: article.
- The Implicit Bias of Depth: How Incremental Learning Drives Generalization, December 2019. URL http://arxiv.org/abs/1909.12051. arXiv:1909.12051 [cs, stat].
- Implicit Regularization in Tensor Factorization. arXiv:2102.09972 [cs, stat], June 2021. URL http://arxiv.org/abs/2102.09972. arXiv: 2102.09972.
- Drawing Early-Bird Tickets: Towards More Efficient Training of Deep Networks, February 2022. URL http://arxiv.org/abs/1909.11957. arXiv:1909.11957 [cs, stat].
- Monarch: Expressive Structured Matrices for Efficient and Accurate Training, April 2022. URL http://arxiv.org/abs/2204.00595. arXiv:2204.00595 [cs].
- Training CNNs with Low-Rank Filters for Efficient Image Classification, February 2016. URL http://arxiv.org/abs/1511.06744. arXiv:1511.06744 [cs].
- Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality Regularization and Singular Value Sparsification. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 2899–2908, Seattle, WA, USA, June 2020. IEEE. ISBN 978-1-72819-360-1. doi: 10.1109/CVPRW50498.2020.00347. URL https://ieeexplore.ieee.org/document/9150852/.
- Low-rank lottery tickets: finding efficient low-rank neural networks via matrix differential equations, May 2022. arXiv:2205.13571 [cs, math, stat].
- Low-Rank Compression of Neural Nets: Learning the Rank of Each Layer. pages 8049–8059, 2020. URL https://openaccess.thecvf.com/content_CVPR_2020/html/Idelbayev_Low-Rank_Compression_of_Neural_Nets_Learning_the_Rank_of_Each_CVPR_2020_paper.html.
- Cuttlefish: Low-rank Model Training without All The Tuning, May 2023. URL http://arxiv.org/abs/2305.02538. arXiv:2305.02538 [cs].
- LoRA: Low-Rank Adaptation of Large Language Models, October 2021. URL http://arxiv.org/abs/2106.09685. arXiv:2106.09685 [cs].
- PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization. In Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper/2019/hash/d9fbed9da256e344c1fa46bb46c34c5f-Abstract.html.
- Pufferfish: Communication-efficient Models At No Extra Cost. page 22.
- Measuring the Intrinsic Dimension of Objective Landscapes. In International Conference on Learning Representations, May 2023.
- Incremental Spectral Learning in Fourier Neural Operator, November 2022. URL https://arxiv.org/abs/2211.15188v3.
- PINNup: Robust neural network wavefield solutions using frequency upscaling and neuron splitting. arXiv:2109.14536 [physics], September 2021. URL http://arxiv.org/abs/2109.14536. arXiv: 2109.14536.
- Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In 2015 IEEE International Conference on Computer Vision (ICCV), pages 1026–1034, Santiago, Chile, December 2015. IEEE. ISBN 978-1-4673-8391-2. doi: 10.1109/ICCV.2015.123. URL http://ieeexplore.ieee.org/document/7410480/.
- ZerO Initialization: Initializing Residual Networks with only Zeros and Ones, October 2021. URL http://arxiv.org/abs/2110.12661. arXiv:2110.12661 [cs].
- Jiawei Zhao (30 papers)
- Yifei Zhang (167 papers)
- Beidi Chen (61 papers)
- Florian Schäfer (41 papers)
- Anima Anandkumar (236 papers)