Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Zero Shot Molecular Generation via Similarity Kernels (2402.08708v1)

Published 13 Feb 2024 in physics.chem-ph and cs.LG

Abstract: Generative modelling aims to accelerate the discovery of novel chemicals by directly proposing structures with desirable properties. Recently, score-based, or diffusion, generative models have significantly outperformed previous approaches. Key to their success is the close relationship between the score and physical force, allowing the use of powerful equivariant neural networks. However, the behaviour of the learnt score is not yet well understood. Here, we analyse the score by training an energy-based diffusion model for molecular generation. We find that during the generation the score resembles a restorative potential initially and a quantum-mechanical force at the end. In between the two endpoints, it exhibits special properties that enable the building of large molecules. Using insights from the trained model, we present Similarity-based Molecular Generation (SiMGen), a new method for zero shot molecular generation. SiMGen combines a time-dependent similarity kernel with descriptors from a pretrained machine learning force field to generate molecules without any further training. Our approach allows full control over the molecular shape through point cloud priors and supports conditional generation. We also release an interactive web tool that allows users to generate structures with SiMGen online (https://zndraw.icp.uni-stuttgart.de).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. G. Corso, H. Stärk, B. Jing, R. Barzilay,  and T. Jaakkola, “DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking,”  (2022), arxiv:2210.01776 [physics, q-bio] .
  2. I. Igashov, H. Stärk, C. Vignac, V. G. Satorras, P. Frossard, M. Welling, M. Bronstein,  and B. Correia, “Equivariant 3D-Conditional Diffusion Models for Molecular Linker Design,”  (2022), arxiv:2210.05274 [cs, q-bio] .
  3. A. Schneuing, Y. Du, C. Harris, A. Jamasb, I. Igashov, W. Du, T. Blundell, P. Lió, C. Gomes, M. Welling, M. Bronstein,  and B. Correia, “Structure-based Drug Design with Equivariant Diffusion Models,”  (2022), arxiv:2210.13695 [cs, q-bio] .
  4. H. Lin, Y. Huang, M. Liu, X. Li, S. Ji,  and S. Z. Li, “DiffBP: Generative Diffusion of 3D Molecules for Target Protein Binding,”  (2022), arxiv:2211.11214 [cs, q-bio] .
  5. T. Xie, X. Fu, O.-E. Ganea, R. Barzilay,  and T. Jaakkola, “Crystal Diffusion Variational Autoencoder for Periodic Material Generation,”  (2022), arxiv:2110.06197 [cond-mat, physics:physics] .
  6. J. Ho, A. Jain,  and P. Abbeel, “Denoising Diffusion Probabilistic Models,”  (2020), arxiv:2006.11239 [cs, stat] .
  7. J. Sohl-Dickstein, E. A. Weiss, N. Maheswaranathan,  and S. Ganguli, “Deep Unsupervised Learning using Nonequilibrium Thermodynamics,”  (2015), arxiv:1503.03585 [cond-mat, q-bio, stat] .
  8. A. Hyvärinen, Journal of Machine Learning Research 6, 695 (2005).
  9. P. Vincent, Neural Computation 23, 1661 (2011).
  10. Y. Song and S. Ermon, “Generative Modeling by Estimating Gradients of the Data Distribution,”  (2020), arxiv:1907.05600 [cs, stat] .
  11. W. Kohn and L. J. Sham, Physical Review 140, A1133 (1965).
  12. G. Kresse and J. Hafner, Physical Review B 47, 558 (1993).
  13. S. Spicher and S. Grimme, Angewandte Chemie International Edition 59, 15665 (2020).
  14. J. Behler and M. Parrinello, Physical Review Letters 98, 146401 (2007).
  15. R. Drautz, Physical Review B 99, 014104 (2019).
  16. S. Zaidi, M. Schaarschmidt, J. Martens, H. Kim, Y. W. Teh, A. Sanchez-Gonzalez, P. Battaglia, R. Pascanu,  and J. Godwin, “Pre-training via Denoising for Molecular Property Prediction,”  (2022), arxiv:2206.00133 [cs, q-bio, stat] .
  17. L. Wu, C. Gong, X. Liu, M. Ye,  and Q. Liu, “Diffusion-based molecule generation with informative prior bridges,”  (2022), arXiv:2209.00865 [cs.LG] .
  18. M. Arts, V. G. Satorras, C.-W. Huang, D. Zuegner, M. Federici, C. Clementi, F. Noé, R. Pinsler,  and R. van den Berg, “Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics,”  (2023), arxiv:2302.00600 [cs] .
  19. C. J. Pickard and R. J. Needs, Journal of Physics: Condensed Matter 23, 053201 (2011).
  20. C. J. Pickard and R. J. Needs, Physical Review Letters 97, 045504 (2006).
  21. M. Xu, L. Yu, Y. Song, C. Shi, S. Ermon,  and J. Tang, “GeoDiff: A Geometric Diffusion Model for Molecular Conformation Generation,”  (2022), arxiv:2203.02923 [cs, q-bio] .
  22. E. Hoogeboom, V. G. Satorras, C. Vignac,  and M. Welling, “Equivariant Diffusion for Molecule Generation in 3D,”  (2022), arxiv:2203.17003 [cs, q-bio, stat] .
  23. C. Hua, S. Luan, M. Xu, R. Ying, J. Fu, S. Ermon,  and D. Precup, “MUDiff: Unified Diffusion for Complete Molecule Generation,”  (2023), arxiv:2304.14621 [cs, q-bio] .
  24. I. Batatia, D. P. Kovács, G. N. C. Simm, C. Ortner,  and G. Csányi, “MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields,”  (2022a), arxiv:2206.07697 [cond-mat, physics:physics, stat] .
  25. V. G. Satorras, E. Hoogeboom,  and M. Welling, “E(n) Equivariant Graph Neural Networks,”  (2022), arxiv:2102.09844 [cs, stat] .
  26. R. Gao, Y. Song, B. Poole, Y. N. Wu,  and D. P. Kingma, “Learning Energy-Based Models by Diffusion Recovery Likelihood,”  (2021), arxiv:2012.08125 [cs, stat] .
  27. T. Salimans and J. Ho, in Energy Based Models Workshop - ICLR 2021 (2021).
  28. D. P. Kovács, J. H. Moore, N. J. Browning, I. Batatia, J. T. Horton, V. Kapil, W. C. Witt, I.-B. Magdău, D. J. Cole,  and G. Csányi, “MACE-OFF23: Transferable Machine Learning Force Fields for Organic Molecules,”  (2023), arxiv:2312.15211 [physics] .
  29. P. M. Morse, Physical Review 34, 57 (1929).
  30. B. Máté and F. Fleuret, “Learning Interpolations between Boltzmann Densities,”  (2023), arxiv:2301.07388 [cs, stat] .
  31. J. Song, C. Meng,  and S. Ermon, “Denoising Diffusion Implicit Models,”  (2022), arxiv:2010.02502 [cs] .
  32. T. Karras, M. Aittala, T. Aila,  and S. Laine, “Elucidating the Design Space of Diffusion-Based Generative Models,”  (2022), arxiv:2206.00364 [cs, stat] .
  33. J. Kennedy and R. Eberhart, in Proceedings of ICNN’95 - International Conference on Neural Networks, Vol. 4 (1995) pp. 1942–1948 vol.4.
  34. C. Vignac, N. Osman, L. Toni,  and P. Frossard, “MiDi: Mixed Graph and 3D Denoising Diffusion for Molecule Generation,”  (2023), arxiv:2302.09048 [cs] .
  35. J. Jo, S. Lee,  and S. J. Hwang, “Score-based Generative Modeling of Graphs via the System of Stochastic Differential Equations,”  (2022), arxiv:2202.02514 [cs] .
  36. C. Zang and F. Wang, in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (2020) pp. 617–626, arxiv:2006.10137 [physics, stat] .
  37. M. Liu, K. Yan, B. Oztekin,  and S. Ji, “GraphEBM: Molecular Graph Generation with Energy-Based Models,”  (2021), arxiv:2102.00546 [cs] .
  38. F. Zills and R. Elijošius, “Zndraw,”  (2023).
  39. L. N. Smith and N. Topin, “Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates,”  (2018), arxiv:1708.07120 [cs, stat] .
  40. J. J. P. Stewart, Journal of Molecular Modeling 13, 1173 (2007).
  41. J. J. P. Stewart, “Mopac2016,” Stewart Computational Chemistry, Colorado Springs, CO, USA, http://OpenMOPAC.net (2016).
  42. F. Zills, M. Schäfer, S. Tovey, J. Kästner,  and C. Holm, “ZnTrack – Data as Code,”  (2024), arXiv:2401.10603 .
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Rokas Elijošius (2 papers)
  2. Fabian Zills (5 papers)
  3. Ilyes Batatia (18 papers)
  4. Sam Walton Norwood (2 papers)
  5. Dávid Péter Kovács (6 papers)
  6. Christian Holm (71 papers)
  7. Gábor Csányi (84 papers)
Citations (3)

Summary

  • The paper introduces SiMGen, which generates molecules in a zero-shot manner using similarity kernels and pretrained energy-based diffusion models.
  • It leverages a time-dependent similarity kernel and pretrained descriptors to control molecular shapes while ensuring energy landscape stability.
  • The method eliminates the need for retraining, enabling rapid and computationally efficient generation of novel molecules for research and experimentation.

Exploring Molecular Generation through Energy-based Diffusion Models and Similarity Kernels

Introduction to SiMGen

Recent advancements in generative modelling have opened up transformative possibilities in the design of novel molecules and materials. Among these, diffusion-based models, particularly those leveraging energy-based formulations, have shown promise in generating complex molecular structures. This article explores the intricacies and findings of a novel approach dubbed Similarity-based Molecular Generation (SiMGen), which eschews the need for training a generative model from scratch. Instead, SiMGen utilizes a time-dependent similarity kernel alongside pretrained descriptors from machine learning force fields, facilitating the generation of molecules that closely resemble a specified reference set.

Analyzing the Energy Landscape

One of the foundational studies of this work involves analyzing the energy landscape shaped by the diffusion process. It is observed that the energy-based diffusion model exhibits a smooth transition from a restorative potential to a quantum-mechanical force, enabling the generation of stable and complex molecules. This analysis sheds light on the behavior of diffusion models and highlights their ability to penalize fragmented structures, a common pitfall in molecular generation. This understanding further informs the development of SiMGen.

SiMGen: A New Approach to Molecular Generation

SiMGen stands out by employing a similarity kernel that adjusts to the local environments of atoms, alongside descriptors from a machine learning quantum mechanical force field, MACE. This combination allows for the generation of molecules without further training, providing a significant advantage in terms of computational efficiency and flexibility. Significant features of SiMGen include:

  • Full control over molecular shape through the adjustment of point cloud priors.
  • Support for conditional generation, enabling the creation of molecules fitting specific constraints.
  • Utilization of evolutionary algorithms and point cloud priors to navigate the combinatorial complexity of chemical space effectively.

Theoretical and Practical Implications

From a theoretical standpoint, SiMGen contributes to our understanding of how generative models navigate the vast chemical space. By dissecting the energy landscape and leveraging pretrained models, this method demonstrates the feasibility of "zero-shot" molecular generation - generating novel structures without retraining models for each new task. Practically, SiMGen offers a tool for rapidly proposing candidate molecules for further experimental or computational investigation, potentially accelerating the discovery of new materials and drugs.

Future Developments in AI-Driven Molecular Design

The SiMGen framework, with its innovative use of similarity kernels and evolutionary algorithms, represents a significant step forward in generative modelling for molecular sciences. Future developments could further refine the control mechanisms over the generation process, incorporate explicit considerations for molecular dynamics and reactivity, and expand the application scope to include materials with specific physical or chemical properties. Moreover, integrating this approach with high-throughput screening and synthesis prediction models could pave the way for a fully automated pipeline for material and molecule discovery.

Conclusion

SiMGen introduces an efficient and flexible method for molecular generation, leveraging insights from energy-based diffusion models and the power of similarity kernels. By circumventing the need for extensive training on new tasks, it offers a promising avenue for exploring chemical space and accelerating the discovery of novel molecules and materials.

X Twitter Logo Streamline Icon: https://streamlinehq.com