Kronecker-Factored Approximate Curvature for Physics-Informed Neural Networks (2405.15603v3)
Abstract: Physics-informed neural networks (PINNs) are infamous for being hard to train. Recently, second-order methods based on natural gradient and Gauss-Newton methods have shown promising performance, improving the accuracy achieved by first-order methods by several orders of magnitude. While promising, the proposed methods only scale to networks with a few thousand parameters due to the high computational cost to evaluate, store, and invert the curvature matrix. We propose Kronecker-factored approximate curvature (KFAC) for PINN losses that greatly reduces the computational cost and allows scaling to much larger networks. Our approach goes beyond the established KFAC for traditional deep learning problems as it captures contributions from a PDE's differential operator that are crucial for optimization. To establish KFAC for such losses, we use Taylor-mode automatic differentiation to describe the differential operator's computation graph as a forward network with shared weights. This allows us to apply KFAC thanks to a recently-developed general formulation for networks with weight sharing. Empirically, we find that our KFAC-based optimizers are competitive with expensive second-order methods on small problems, scale more favorably to higher-dimensional neural networks and PDEs, and consistently outperform first-order methods and LBFGS.
- Amari, S.-I. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.
- Benzing, F. Gradient Descent on Neurons and its Link to Approximate Second-order Optimization. In International Conference on Machine Learning (ICML), 2022.
- Taylor-mode automatic differentiation for higher-order derivatives in JAX. In Advances in Neural Information Processing Systems (NeurIPS); Workhop on Program Transformations for ML, 2019.
- The Challenges of the Nonlinear Regime for Physics-Informed Neural Networks. arXiv preprint arXiv:2402.03864, 2024.
- Practical Gauss-Newton optimisation for deep learning. In International Conference on Machine Learning (ICML), 2017.
- Optimization methods for large-scale machine learning. SIAM Review (SIREV), 60, 2016.
- TENG: Time-Evolving Natural Gradient for Solving PDEs with Deep Neural Net. arXiv preprint arXiv:2404.10771, 2024.
- Adam through a second-order lens. 2023.
- Scientific machine learning through physics–informed neural networks: Where we are and what’s next. Journal of Scientific Computing, 92(3):88, 2022.
- Modular Block-diagonal Curvature Approximations for Feedforward Architectures. In International Conference on Artificial Intelligence and Statistics (AISTATS), 2020a.
- BackPACK: Packing more into Backprop. In International Conference on Learning Representations (ICLR), 2020b.
- ViViT: Curvature access through the generalized gauss-newton’s low-rank structure. Transactions on Machine Learning Research (TMLR), 2022.
- Rethinking the importance of sampling in physics-informed neural networks. arXiv preprint arXiv:2207.02338, 2022.
- An operator preconditioning perspective on training in physics-informed machine learning. arXiv preprint arXiv:2310.05801, 2023.
- Neural-network-based approximations for solving partial differential equations. communications in Numerical Methods in Engineering, 10(3):195–201, 1994.
- Second-order optimisation strategies for neural network quantum states. arXiv preprint arXiv:2401.17550, 2024.
- The deep ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
- Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures. In Advances in Neural Information Processing Systems (NeurIPS), 2023.
- Evaluating derivatives: principles and techniques of algorithmic differentiation. SIAM, 2008.
- Algorithm 755: ADOL-C: A package for the automatic differentiation of algorithms written in C/C++. ACM Transactions on Mathematical Software (TOMS), 22(2):131–167, 1996.
- A Kronecker-Factored Approximate Fisher Matrix for Convolution Layers. In International Conference on Machine Learning (ICML), 2016.
- Studying large language model generalization with influence functions. arXiv preprint arXiv:2308.03296, 2023.
- Heskes, T. On “natural” learning and pruning in multilayered perceptrons. Neural Computation, 12(4):881–901, 2000.
- Gauss-Newton Natural Gradient Descent for Physics-Informed Computational Fluid Dynamics. arXiv preprint arXiv:2402.10680, 2024.
- Taylor-made higher-order automatic differentiation. 2021. URL https://github.com/google/jax/files/6717197/jet.pdf. Accessed January 03, 2024.
- Characterizing possible failure modes in physics-informed neural networks. Advances in Neural Information Processing Systems, 34:26548–26560, 2021.
- Limitations of the empirical Fisher approximation for natural gradient descent. Advances in neural information processing systems, 32, 2019.
- Artificial neural networks for solving ordinary and partial differential equations. IEEE transactions on neural networks, 9(5):987–1000, 1998.
- Forward Laplacian: A New Computational Framework for Neural Network-based Variational Monte Carlo. 2023.
- DOF: Accelerating high-order differential operators with forward propagation. In International Conference on Learning Representations (ICLR), Workshop on AI4DifferentialEquations In Science, 2024.
- Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC for Large Neural Nets. 2023a.
- Simplifying Momentum-based Riemannian Submanifold Optimization. 2023b.
- Preconditioning for physics-informed neural networks. arXiv preprint arXiv:2402.00531, 2024.
- DeepXDE: A deep learning library for solving differential equations. SIAM Review, 63(1):208–228, 2021.
- Markidis, S. The old and the new: Can physics-informed deep-learning replace traditional linear solvers? Frontiers in big Data, 4:669097, 2021.
- Martens, J. Deep learning via Hessian-free optimization. In International Conference on Machine Learning (ICML), 2010.
- Martens, J. New insights and perspectives on the natural gradient method, 2020.
- Optimizing Neural Networks with Kronecker-factored Approximate Curvature. In International Conference on Machine Learning (ICML), 2015.
- Kronecker-factored Curvature Approximations for Recurrent Neural Networks. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=HyMTkQZAb.
- Mixed precision training. 2017.
- Achieving high accuracy with PINNs via energy natural gradient descent. In International Conference on Machine Learning, pp. 25471–25485. PMLR, 2023.
- Optimization in SciML–A Function Space Perspective. arXiv preprint arXiv:2402.07318, 2024.
- Efficient training of physics-informed neural networks via importance sampling. Computer-Aided Civil and Infrastructure Engineering, 36(8):962–977, 2021.
- Pipefisher: Efficient training of large language models using pipelining and Fisher information matrices. Proceedings of Machine Learning and Systems, 5, 2023.
- Papyan, V. Measurements of three-level hierarchical structure in the outliers in the spectrum of deepnet Hessians. In International Conference on Machine Learning (ICML), 2019.
- PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems (NeurIPS). 2019.
- Kaisa: an adaptive second-order optimizer framework for deep neural networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–14, 2021.
- Pearlmutter, B. A. Fast Exact Multiplication by the Hessian. Neural Computation, 1994.
- ISAAC Newton: Input-based Approximate Curvature for Newton’s Method. In International Conference on Learning Representations (ICLR), 2023.
- Ab initio solution of the many-electron Schrödinger equation with deep neural networks. Physical Review Research, 2(3):033429, 2020.
- Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
- Schraudolph, N. N. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 2002.
- Dgm: A deep learning algorithm for solving partial differential equations. Journal of computational physics, 375:1339–1364, 2018.
- Late-Phase Second-Order Training. In Advances in Neural Information Processing Systems (NeurIPS), Workshop Has it Trained Yet?, 2022.
- Optimally weighted loss functions for solving PDEs with neural networks. Journal of Computational and Applied Mathematics, 405:113887, 2022.
- Understanding and mitigating gradient flow pathologies in physics-informed neural networks. SIAM Journal on Scientific Computing, 43(5):A3055–A3081, 2021.
- Respecting causality is all you need for training physics-informed neural networks. arXiv preprint arXiv:2203.07404, 2022a.
- When and why PINNs fail to train: A neural tangent kernel perspective. Journal of Computational Physics, 449:110768, 2022b.
- Weights and Biases. Experiment Tracking with Weights and Biases, 2020. URL https://www.wandb.ai/. Software available from wandb.ai.
- A comprehensive study of non-adaptive and residual-based adaptive sampling for physics-informed neural networks. Computer Methods in Applied Mechanics and Engineering, 403:115671, 2023.
- PETScML: Second-order solvers for training regression problems in Scientific Machine Learning. arXiv preprint arXiv:2403.12188, 2024.
- Investigating molecular transport in the human brain from MRI with physics-informed neural networks. Scientific Reports, 12(1):1–12, 2022.
- Competitive physics informed networks. arXiv preprint arXiv:2204.11144, 2022.
- Felix Dangel (20 papers)
- Johannes Müller (60 papers)
- Marius Zeinhofer (21 papers)