Emergent Mind

Learning Joint 2D & 3D Diffusion Models for Complete Molecule Generation

(2305.12347)
Published May 21, 2023 in q-bio.BM and cs.LG

Abstract

Designing new molecules is essential for drug discovery and material science. Recently, deep generative models that aim to model molecule distribution have made promising progress in narrowing down the chemical research space and generating high-fidelity molecules. However, current generative models only focus on modeling either 2D bonding graphs or 3D geometries, which are two complementary descriptors for molecules. The lack of ability to jointly model both limits the improvement of generation quality and further downstream applications. In this paper, we propose a new joint 2D and 3D diffusion model (JODO) that generates complete molecules with atom types, formal charges, bond information, and 3D coordinates. To capture the correlation between molecular graphs and geometries in the diffusion process, we develop a Diffusion Graph Transformer to parameterize the data prediction model that recovers the original data from noisy data. The Diffusion Graph Transformer interacts node and edge representations based on our relational attention mechanism, while simultaneously propagating and updating scalar features and geometric vectors. Our model can also be extended for inverse molecular design targeting single or multiple quantum properties. In our comprehensive evaluation pipeline for unconditional joint generation, the results of the experiment show that JODO remarkably outperforms the baselines on the QM9 and GEOM-Drugs datasets. Furthermore, our model excels in few-step fast sampling, as well as in inverse molecule design and molecular graph generation. Our code is provided in https://github.com/GRAPH-0/JODO.

Molecules generated by JODO, with 2D graphs below their 3D geometries for QM9 and GEOM-Drugs.

Overview

  • The paper introduces JODO, a deep generative model that simultaneously uses 2D bonding graphs and 3D geometries to generate molecules, integrating these perspectives to produce stable and chemically valid results.

  • JODO employs a Diffusion Graph Transformer (DGT) to parameterize the diffusion process, utilizing a relational attention mechanism to update scalar features and geometric vectors of atoms and bonds.

  • Significant performance improvements are demonstrated on the QM9 and GEOM-Drugs datasets, validating JODO's efficacy in generating both unconditional and conditionally guided molecules with specific quantum properties.

Overview of "Learning Joint 2D and 3D Diffusion Models for Complete Molecule Generation"

The paper presents a novel deep generative model for complete molecule generation that integrates both 2D bonding graphs and 3D geometries. This approach addresses the limitations of existing models that only focus on either 2D or 3D representations. The proposed model, termed JODO (JOint 2D and 3D Diffusion mOdel), leverages a diffusion-based framework along with a Diffusion Graph Transformer (DGT) to generate molecules considering atom types, formal charges, bond information, and 3D coordinates.

Key Contributions

  1. Joint 2D and 3D Diffusion Model: The paper introduces JODO, which models the joint distribution of 2D molecular graphs and 3D geometries. This enables the generation of more chemically valid and geometrically stable molecules compared to models focusing solely on one aspect.
  2. Diffusion Graph Transformer (DGT): JODO is powered by the DGT, which parameterizes the diffusion process. This architecture interacts with node and edge representations using a relational attention mechanism while simultaneously updating scalar features and geometric vectors.
  3. Enhanced Generation Quality: The model demonstrates superior performance in molecule generation tasks, particularly on the QM9 and GEOM-Drugs datasets, excelling in both unconditional joint generation and conditional molecule generation targeting specific quantum properties.
  4. Versatility in Applications: JODO is not limited to generating molecules but can be extended to support conditional generation targeting quantum properties. This makes it particularly suited for inverse molecular design applications.

Experimental Validation

Datasets and Evaluation Metrics

QM9 Dataset: This is a benchmark dataset with small molecules having up to 29 atoms including hydrogen. The paper evaluates the model against several baselines, demonstrating superior performance across various metrics such as validity, uniqueness, and molecular stability.

GEOM-Drugs Dataset: This dataset includes larger, drug-like molecules. JODO significantly outperforms existing models in terms of generating chemically valid molecules with stable 3D geometries as evidenced by the FCD and alignment metrics.

Evaluation Metrics:

  • 2D Molecular Graph Metrics: These include validity (V{content}C), uniqueness, novelty, and stability (Atom stable, Mol stable).
  • 3D Geometry Metrics: These entail stability metrics and Fréchet ChemNet Distance (FCD).
  • Substructure Geometry Alignment Metrics: These assess the distances between the distribution of bond lengths, angles, and dihedral angles in the generated molecules and the test set.

Strong Numerical Results

QM9 Dataset:

  • Atom Stable: 99.9%
  • Mol Stable: 98.8%
  • FCD: 0.138, indicating high similarity to the test distribution

GEOM-Drugs Dataset:

  • Validity (V{content}C): 87.4%
  • Mol Stable: 98.1%
  • FCD: 2.523

Implications and Future Directions

The joint modeling of 2D and 3D molecular descriptors in diffusion models presents significant advancements in molecule generation. The comprehensive evaluation on multiple datasets indicates that JODO not only enhances the generative performance but also opens up new avenues for applications in drug discovery and material science.

Practical Implications

  • Drug Discovery: Accurately generating new molecules quickly narrows down the chemical search space, facilitating the discovery of novel drugs with optimal properties.
  • Material Science: The ability to generate molecules with specific quantum properties has implications for the design of new materials with desired characteristics.

Theoretical Implications

  • Unified Modeling: Combining 2D and 3D representations in a single model addresses a critical gap in the generative modeling of molecules. This unified approach may inspire further research into integrating multiple complementary descriptors in other domains.
  • Diffusion Models: The application of diffusion models with enhanced architectures like the DGT may be extended to other generative tasks where complex dependencies need to be captured.

Speculation on Future Developments

The work sets a precedent for future advancements in AI-driven molecule generation. Potential areas of research and development include:

  • Fast Sampling with Equivariance: Further refinements in fast sampling techniques for equivariant diffusion models could make these models more practical for real-time applications.
  • Enhanced Conditionally Guided Generation: Incorporating more sophisticated guidance mechanisms such as classifier-free guidance could improve the quality and efficiency of conditionally generated molecules.

Conclusion

The paper "Learning Joint 2D and 3D Diffusion Models for Complete Molecule Generation" makes significant strides in the realm of molecular generative models by integrating two complementary molecular descriptors. With robust performance in both unconditional and conditional generation tasks, this approach offers a promising avenue for advanced applications in drug discovery and material science, driving future research towards more integrated and efficient generative models.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.