Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Generative Symbolic Regression (2401.00282v1)

Published 30 Dec 2023 in cs.LG

Abstract: Symbolic regression (SR) aims to discover concise closed-form mathematical equations from data, a task fundamental to scientific discovery. However, the problem is highly challenging because closed-form equations lie in a complex combinatorial search space. Existing methods, ranging from heuristic search to reinforcement learning, fail to scale with the number of input variables. We make the observation that closed-form equations often have structural characteristics and invariances (e.g., the commutative law) that could be further exploited to build more effective symbolic regression solutions. Motivated by this observation, our key contribution is to leverage pre-trained deep generative models to capture the intrinsic regularities of equations, thereby providing a solid foundation for subsequent optimization steps. We show that our novel formalism unifies several prominent approaches of symbolic regression and offers a new perspective to justify and improve on the previous ad hoc designs, such as the usage of cross-entropy loss during pre-training. Specifically, we propose an instantiation of our framework, Deep Generative Symbolic Regression (DGSR). In our experiments, we show that DGSR achieves a higher recovery rate of true equations in the setting of a larger number of input variables, and it is more computationally efficient at inference time than state-of-the-art RL symbolic regression solutions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (48)
  1. Neural program synthesis with priority queue training. arXiv preprint arXiv:1801.03526, 2018.
  2. Ahmed M Alaa and Mihaela van der Schaar. Demystifying black-box models with symbolic metamodels. Advances in Neural Information Processing Systems, 32, 2019.
  3. Symbolic regression via genetic programming. In Proceedings. Vol. 1. Sixth Brazilian Symposium on Neural Networks, pp.  173–178. IEEE, 2000.
  4. Evolutionary computation 1: Basic algorithms and operators. CRC press, 2018.
  5. Ai-assisted discovery of quantitative and formal models in social science. arXiv preprint arXiv:2210.00563, 2022.
  6. A distributional perspective on reinforcement learning. In International Conference on Machine Learning, pp.  449–458. PMLR, 2017.
  7. Neural symbolic regression that scales. In International Conference on Machine Learning, pp.  936–945. PMLR, 2021.
  8. Closed forms: what they are and why we care. Notices of the AMS, 60(1):50–65, 2013.
  9. Genetic-gated networks for deep reinforcement learning. Advances in neural information processing systems, 31, 2018.
  10. Combining a gradient-based method and an evolution strategy for multi-objective reinforcement learning. Applied Intelligence, 50(10):3301–3317, 2020.
  11. Reinforced evolutionary neural architecture search. arXiv preprint arXiv:1808.00193, 2018.
  12. Fast neural models for symbolic regression at scale. arXiv preprint arXiv:2007.10784, 2020.
  13. Learning outside the black-box: The pursuit of interpretable models. Advances in Neural Information Processing Systems, 33:17838–17849, 2020.
  14. Deep symbolic regression for recurrence prediction. In International Conference on Machine Learning, pp.  4520–4536. PMLR, 2022.
  15. The feynman lectures on physics; vol. i. American Journal of Physics, 33(9):750–752, 1965.
  16. Roger Fletcher. Practical methods of optimization. John Wiley & Sons, 2013.
  17. Deap: Evolutionary algorithms made easy. The Journal of Machine Learning Research, 13(1):2171–2175, 2012.
  18. Meta-learning priors for efficient online bayesian regression. In International Workshop on the Algorithmic Foundations of Robotics, pp.  318–337. Springer, 2018.
  19. Bayesian symbolic regression. arXiv preprint arXiv:1910.08892, 2019.
  20. End-to-end symbolic regression with transformers. arXiv preprint arXiv:2204.10532, 2022.
  21. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  22. JRGP Koza. On the programming of computers by means of natural selection. Genetic programming, 1992.
  23. Approximating geometric crossover by semantic backpropagation. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp.  941–948, 2013.
  24. Contemporary symbolic regression methods and their relative performance. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 1), 2021.
  25. Deep learning for symbolic mathematics. In International Conference on Learning Representations, 2019.
  26. Discovering symbolic policies with deep reinforcement learning. In International Conference on Machine Learning, pp.  5979–5989. PMLR, 2021.
  27. Set transformer: A framework for attention-based permutation-invariant neural networks. In International Conference on Machine Learning, pp.  3744–3753. PMLR, 2019.
  28. Discovering sparse interpretable dynamics from partial observations. arXiv preprint arXiv:2107.10879, 2021.
  29. Sympy: symbolic computing in python. PeerJ Computer Science, 3:e103, 2017.
  30. Symbolic regression via deep reinforcement learning enhanced genetic programming seeding. In Advances in Neural Information Processing Systems, 2021.
  31. Kevin P Murphy. Machine learning: a probabilistic perspective. MIT press, 2012.
  32. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
  33. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. In International Conference on Learning Representations, 2020.
  34. Robust learning from noisy, incomplete, high-dimensional experimental data via physically constrained symbolic regression. Nature communications, 12(1):1–8, 2021.
  35. Pmlb v1. 0: an open-source dataset collection for benchmarking machine learning methods. Bioinformatics, 38(3):878–880, 2022.
  36. Distilling free-form natural laws from experimental data. science, 324(5923):81–85, 2009.
  37. Improvements in beam search. In Third international conference on spoken language processing, 1994.
  38. Metamodeling by symbolic regression and pareto simulated annealing. Structural and Multidisciplinary Optimization, 35(4):315–326, 2008.
  39. Steven H Strogatz. Nonlinear dynamics and chaos: with applications to physics, biology, chemistry, and engineering. CRC press, 2018.
  40. Deep neuroevolution: Genetic algorithms are a competitive alternative for training deep neural networks for reinforcement learning. arXiv preprint arXiv:1712.06567, 2017.
  41. Sequence to sequence learning with neural networks. Advances in neural information processing systems, 27, 2014.
  42. Reinforcement learning for integer programming: Learning to cut. In International Conference on Machine Learning, pp.  9367–9376. PMLR, 2020.
  43. Ai feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16):eaay2631, 2020.
  44. Semantically-based crossover in genetic programming: application to real-valued symbolic regression. Genetic Programming and Evolvable Machines, 12(2):91–119, 2011.
  45. Symbolicgpt: A generative transformer model for symbolic regression. arXiv preprint arXiv:2106.14131, 2021.
  46. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  47. Better gp benchmarks: community survey results and proposals. Genetic Programming and Evolvable Machines, 14(1):3–29, 2013.
  48. Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3):229–256, 1992.
Citations (22)

Summary

We haven't generated a summary for this paper yet.