Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties (2310.04463v1)

Published 5 Oct 2023 in q-bio.BM, cs.AI, and cs.LG

Abstract: In the past decade, Artificial Intelligence driven drug design and discovery has been a hot research topic, where an important branch is molecule generation by generative models, from GAN-based models and VAE-based models to the latest diffusion-based models. However, most existing models pursue only the basic properties like validity and uniqueness of the generated molecules, a few go further to explicitly optimize one single important molecular property (e.g. QED or PlogP), which makes most generated molecules little usefulness in practice. In this paper, we present a novel approach to generating molecules with desirable properties, which expands the diffusion model framework with multiple innovative designs. The novelty is two-fold. On the one hand, considering that the structures of molecules are complex and diverse, and molecular properties are usually determined by some substructures (e.g. pharmacophores), we propose to perform diffusion on two structural levels: molecules and molecular fragments respectively, with which a mixed Gaussian distribution is obtained for the reverse diffusion process. To get desirable molecular fragments, we develop a novel electronic effect based fragmentation method. On the other hand, we introduce two ways to explicitly optimize multiple molecular properties under the diffusion model framework. First, as potential drug molecules must be chemically valid, we optimize molecular validity by an energy-guidance function. Second, since potential drug molecules should be desirable in various properties, we employ a multi-objective mechanism to optimize multiple molecular properties simultaneously. Extensive experiments with two benchmark datasets QM9 and ZINC250k show that the molecules generated by our proposed method have better validity, uniqueness, novelty, Fr\'echet ChemNet Distance (FCD), QED, and PlogP than those generated by current SOTA models.

Citations (3)

View on Semantic Scholar

Summary

The paper introduces a dual-level diffusion model with a Pareto-based multi-objective framework to optimize multiple molecular properties simultaneously.
The approach combines whole-molecule and fragment-level diffusion using a mixed Gaussian distribution to capture complex chemical structures accurately.
Energy guidance and rigorous validation, including 100% validity on benchmark datasets, ensure the generation of chemically sound and practically useful molecules.

Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties

In "Diffusing on Two Levels and Optimizing for Multiple Properties: A Novel Approach to Generating Molecules with Desirable Properties," Guo et al. present a sophisticated method aimed at addressing the limitations of traditional generative models in the domain of AI-driven drug design. The core objective is to generate molecules not only with high validity and uniqueness but with multiple optimized properties simultaneously, making them more practical for real-world drug discovery.

The authors propose a dual-level diffusion model combined with a multi-objective optimization framework. This model represents a significant advancement over single-level diffusion models by considering the complex and diverse structures of molecules, including both whole molecules and their fragments. This dual-level approach is designed to generate a more accurate mixed Gaussian distribution for the reverse diffusion process.

Methodology

Molecule Fragmentation Based on Electronic Effect (FREE)

The FREE method fragments molecules based on electronic effects, a concept derived from physicochemical properties related to acidity and alkalinity. This fragmentation accounts for specific substructures like pharmacophores, crucial in determining molecular properties. The FREE method segments each molecule into ring and chain substructures and matches them against a substituent table to create a fragment vocabulary. This process ensures that the fragments are highly informative and relevant to the desired molecular properties.

Diffusion on Two Molecular Structural Levels (D2L)

The D2L model applies diffusion processes to both molecular graphs and fragments. By introducing noise in a controlled forward diffusion process, the model approximates a Gaussian distribution for both entire molecules and their segments. These Gaussians are then combined into a mixed Gaussian distribution, which is used for the reverse diffusion process to generate new molecular structures. The dual-level diffusion is crucial for capturing the true chemical space more comprehensively than single-level models.

Energy Guidance Function

To ensure the generated molecules are chemically valid, the authors implement an energy guidance mechanism. This function assigns low energy to molecules meeting the desired validity threshold and high energy to less valid molecules. During the reverse diffusion process, this guidance steers the generation towards low-energy (i.e., highly valid) molecules. This energy-based control is critical for maintaining high chemical validity, which is a fundamental property for potential drug candidates.

Multi-objective Optimization (OMP)

Considering the diverse and sometimes conflicting properties of drug molecules, the authors employ a Pareto optimality-based multi-objective optimization. The OMP module simultaneously optimizes multiple molecular properties (e.g., QED and PlogP) without collapsing them into a single aggregate metric. This approach ensures that all targeted properties are optimized to a desirable state, leading to the generation of practically useful molecules. The OMP problem is addressed using Karush-Kuhn-Tucker (KKT) conditions to find the Pareto front of optimal solutions.

Experimental Results

The effectiveness of this method is demonstrated through extensive experimentation on two benchmark datasets: QM9 and ZINC250K. The proposed model outperforms several state-of-the-art baseline models, including JT-VAE, MoFlow, GDSS, and DiGress, in terms of validity, uniqueness, novelty, Fréchet ChemNet Distance (FCD), QED, and PlogP. Specifically, the proposed model achieves 100% validity and introduces significantly higher QED and PlogP values in the top- $k$ generated molecules across both datasets.

Results show that diffusing on two levels—including electronic effect-driven fragmentation—and applying an energy guidance function alongside multi-objective optimization, lead to superior performance in generating chemically valid, unique, and drug-like molecules. Extensive ablation studies validate the necessity of each novel design component, demonstrating their contribution to overall performance improvement.

Implications and Future Directions

The dual-level diffusion and multi-objective optimization strategy have important practical and theoretical implications. Practically, the generation of molecules with multiple desirable properties can accelerate the drug discovery process, reducing both time and cost. Theoretically, the integration of electronic effect-based fragmentation into the generative model framework addresses the complexity of capturing the true chemical diversity in molecule generation.

In future research, the focus could be on extending this dual-level diffusion model to more structural levels and incorporating more effective optimization algorithms, potentially optimizing a broader set of molecular properties. Addressing these areas could further enhance the model's applicability in more diverse chemical spaces and molecular design requirements.

In conclusion, Guo et al.'s method offers a considerable improvement over traditional single-level generative models by integrating dual-level diffusion and multi-objective optimization, representing a substantial step forward in the field of AI-driven molecule generation for drug discovery.

PDF Markdown