Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SubGDiff: A Subgraph Diffusion Model to Improve Molecular Representation Learning (2405.05665v1)

Published 9 May 2024 in cs.LG and q-bio.QM

Abstract: Molecular representation learning has shown great success in advancing AI-based drug discovery. The core of many recent works is based on the fact that the 3D geometric structure of molecules provides essential information about their physical and chemical characteristics. Recently, denoising diffusion probabilistic models have achieved impressive performance in 3D molecular representation learning. However, most existing molecular diffusion models treat each atom as an independent entity, overlooking the dependency among atoms within the molecular substructures. This paper introduces a novel approach that enhances molecular representation learning by incorporating substructural information within the diffusion process. We propose a novel diffusion model termed SubGDiff for involving the molecular subgraph information in diffusion. Specifically, SubGDiff adopts three vital techniques: i) subgraph prediction, ii) expectation state, and iii) k-step same subgraph diffusion, to enhance the perception of molecular substructure in the denoising network. Experimentally, extensive downstream tasks demonstrate the superior performance of our approach. The code is available at https://github.com/youjibiying/SubGDiff.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Diffusion-based time series imputation and forecasting with structured state space models. Transactions on Machine Learning Research, 2022.
  2. Anonymous. Subdiff: Subgraph latent diffusion model, 2024. URL https://openreview.net/forum?id=z2avrOUajn.
  3. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data, 9(1):185, 2022.
  4. Machine learning of accurate energy-conserving molecular force fields. Science advances, 3(5):e1603015, 2017.
  5. Conformational polymorphism. Chemical reviews, 114(4):2170–2191, 2014.
  6. Geomol: Torsional geometric generation of molecular 3d conformer ensembles. Advances in Neural Information Processing Systems, 34:13757–13769, 2021.
  7. Diffusion models for graphs benefit from discrete state spaces. In The First Learning on Graphs Conference, 2022.
  8. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  9. Equivariant diffusion for molecule generation in 3d. In International conference on machine learning, pp.  8867–8887. PMLR, 2022.
  10. Open graph benchmark: Datasets for machine learning on graphs. arXiv preprint arXiv:2005.00687, 2020a.
  11. Strategies for pre-training graph neural networks. In International Conference on Learning Representations (ICLR), 2020b.
  12. Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019.
  13. Hierarchical generation of molecular graphs using structural motifs. In International conference on machine learning, pp.  4839–4848. PMLR, 2020.
  14. Torsional diffusion for molecular conformer generation. In Advances in Neural Information Processing Systems, 2022.
  15. Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography, 32(5):922–923, 1976.
  16. Autoregressive diffusion model for graph generation. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  17391–17408. PMLR, 23–29 Jul 2023a. URL https://proceedings.mlr.press/v202/kong23b.html.
  17. Autoregressive diffusion model for graph generation. In Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., and Scarlett, J. (eds.), Proceedings of the 40th International Conference on Machine Learning, volume 202 of Proceedings of Machine Learning Research, pp.  17391–17408. PMLR, 23–29 Jul 2023b. URL https://proceedings.mlr.press/v202/kong23b.html.
  18. Masked diffusion models are fast learners. arXiv preprint arXiv:2306.11363, 2023.
  19. Functional-group-based diffusion for pocket-specific molecule generation and elaboration. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=lRG11M91dx.
  20. N-gram graph: Simple unsupervised representation for graphs, with applications to molecules. Advances in neural information processing systems, 32, 2019.
  21. Pre-training molecular graph representation with 3D geometry. In International Conference on Learning Representations, 2022.
  22. A group symmetric stochastic differential equation model for molecule multi-modal pretraining. In International Conference on Machine Learning, pp.  21497–21526. PMLR, 2023a.
  23. Molecular geometry pretraining with SE(3)-invariant denoising distance matching. In The Eleventh International Conference on Learning Representations, 2023b. URL https://openreview.net/forum?id=CjTHVo1dvR.
  24. Molecular geometry prediction using a deep generative graph neural network. Scientific reports, 9(1):20381, 2019.
  25. Dynamics and thermodynamics of ibuprofen conformational isomerism at the crystal/solution interface. Journal of chemical theory and computation, 14(12):6484–6494, 2018.
  26. Interpretable and generalizable graph learning via stochastic attention mechanism. In International Conference on Machine Learning, pp.  15524–15543. PMLR, 2022.
  27. Pubchemqc project: a large-scale first-principles electronic structure database for data-driven chemistry. Journal of chemical information and modeling, 57(6):1300–1308, 2017.
  28. Permutation invariant graph generation via score-based generative modeling. In International Conference on Artificial Intelligence and Statistics, pp.  4474–4484. PMLR, 2020.
  29. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018.
  30. Masked diffusion as self-supervised representation learner. arXiv preprint arXiv:2308.05695, 2023.
  31. Data-driven strategies for accelerated materials design. Accounts of Chemical Research, 54(4):849–860, 2021.
  32. Quantum chemistry structures and properties of 134 kilo molecules. Scientific data, 1(1):1–7, 2014.
  33. Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
  34. Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, pp.  9377–9388. PMLR, 2021.
  35. Molecular property prediction: recent trends in the era of artificial intelligence. Drug Discovery Today: Technologies, 32:29–36, 2019.
  36. Learning gradient fields for molecular conformation generation. In International conference on machine learning, pp.  9558–9568. PMLR, 2021.
  37. A generative model for molecular distance geometry. In III, H. D. and Singh, A. (eds.), Proceedings of the 37th International Conference on Machine Learning, volume 119, pp.  8949–8958. PMLR, 2020.
  38. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pp.  2256–2265. PMLR, 2015.
  39. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, pp.  11918–11930, 2019.
  40. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020.
  41. 3D infomax improves GNNs for molecular property prediction. In International Conference on Machine Learning, pp.  20479–20502. PMLR, 2022.
  42. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. In International Conference on Learning Representations (ICLR), 2020.
  43. Improving molecular contrastive learning via faulty negative mitigation and decomposed fragment contrast. Journal of Chemical Information and Modeling, 62(11):2713–2725, 2022a.
  44. Molecular contrastive learning of representations via graph neural networks. Nature Machine Intelligence, 4(3):279–287, 2022b.
  45. MoleculeNet: a benchmark for molecular machine learning. Chemical science, 9(2):513–530, 2017.
  46. How powerful are graph neural networks? In International Conference on Learning Representations, 2018.
  47. Learning neural generative dynamics for molecular conformation generation. In International Conference on Learning Representations, 2021a.
  48. An end-to-end framework for molecular conformation generation via bilevel programming. In International Conference on Machine Learning, pp.  11537–11547. PMLR, 2021b.
  49. GeoDiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=PzcvxEMzvQC.
  50. Molecule3d: A benchmark for predicting 3d geometries from molecular graphs. arXiv preprint arXiv:2110.01717, 2021c.
  51. Molecular representation learning via heterogeneous motif graph neural networks. In International Conference on Machine Learning, pp.  25581–25594. PMLR, 2022.
  52. Pre-training via denoising for molecular property prediction. In The Eleventh International Conference on Learning Representations, 2023.
  53. Hierarchical molecular graph self-supervised learning for property prediction. Communications Chemistry, 6(1):34, 2023.
  54. Hypergraph convolutional networks via equivalency between hypergraphs and undirected graphs. In ICML 2022 Workshop on Topology, Algebra, and Geometry in Machine Learning, 2022a.
  55. Fine-tuning graph neural networks via graph topology induced optimal transport. In Raedt, L. D. (ed.), Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI-22, pp.  3730–3736. International Joint Conferences on Artificial Intelligence Organization, 7 2022b. doi: 10.24963/ijcai.2022/518. URL https://doi.org/10.24963/ijcai.2022/518. Main Track.
  56. DiffPack: A torsional diffusion model for autoregressive protein side-chain packing. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=sXMQPKbLXf.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jiying Zhang (12 papers)
  2. Zijing Liu (22 papers)
  3. Yu Wang (939 papers)
  4. Yu Li (378 papers)
Citations (1)

Summary

We haven't generated a summary for this paper yet.