Molecule Generation for Drug Design: a Graph Learning Perspective (2202.09212v2)
Abstract: Machine learning, particularly graph learning, is gaining increasing recognition for its transformative impact across various fields. One such promising application is in the realm of molecule design and discovery, notably within the pharmaceutical industry. Our survey offers a comprehensive overview of state-of-the-art methods in molecule design, particularly focusing on \emph{de novo} drug design, which incorporates (deep) graph learning techniques. We categorize these methods into three distinct groups: \emph{i)} \emph{all-at-once}, \emph{ii)} \emph{fragment-based}, and \emph{iii)} \emph{node-by-node}. Additionally, we introduce some key public datasets and outline the commonly used evaluation metrics for both the generation and optimization of molecules. In the end, we discuss the existing challenges in this field and suggest potential directions for future research.
- Spanning tree-based graph generation for molecules. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=w60btE_8T2m.
- Deep learning and knowledge-based methods for computer-aided molecular design—toward a unified approach: State-of-the-art and future directions. Computers & Chemical Engineering, 2020.
- Amy C Anderson. The process of structure-based drug design. Chemistry & biology, 10(9):787–797, 2003.
- Structured denoising diffusion models in discrete state-spaces. Advances in Neural Information Processing Systems, 34:17981–17993, 2021.
- Interaction networks for learning about objects, relations and physics. Advances in neural information processing systems, 29, 2016.
- Flow network based generative models for non-iterative diverse candidate generation. Advances in Neural Information Processing Systems, 34:27381–27394, 2021.
- Quantifying the chemical beauty of drugs. Nature chemistry, 2012.
- A model to search for synthesizable molecules. Advances in Neural Information Processing Systems, 32, 2019.
- Guillaume Maurice Jean-Bernard Chaslot Chaslot. Monte-carlo tree search. Maastricht University, 2010.
- A deep generative model for molecule optimization via one fragment modification. Nature machine intelligence, 3(12):1040–1049, 2021.
- Multi-resolution spectral coherence for graph generation with score-based diffusion. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=qUlpDjYnsp.
- MolGAN: An implicit generative model for small molecular graphs. ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models, 2018.
- Density estimation using real NVP. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=HkpbnH9lx.
- Hyperedge replacement graph grammars. In Handbook Of Graph Grammars And Computing By Graph Transformation: Volume 1: Foundations, pages 95–162. World Scientific, 1997.
- Melatonin receptor antagonists that differentiate between the human mel1a and mel1b recombinant subtypes are used to assess the pharmacological profile of the rabbit retina ml1 presynaptic heteroreceptor. Naunyn-Schmiedeberg’s archives of pharmacology, 355:365–375, 1997.
- Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering, 2019.
- Estimation of synthetic accessibility score of drug-like molecules based on molecular complexity and fragment contributions. Journal of cheminformatics, 2009.
- Deep graph generators: A survey. IEEE Access, 2021.
- Graph deconvolutional generation. arXiv preprint arXiv:2002.07087, 2020.
- Mimosa: Multi-constraint molecule sampling for molecule optimization. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 125–133, 2021.
- E(n) equivariant normalizing flows. Advances in Neural Information Processing Systems, 34:4181–4192, 2021.
- Symmetry-adapted generation of 3d point sets for the targeted discovery of molecules. Advances in neural information processing systems, 32, 2019.
- De novo molecular generation via connection-aware motif mining. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=Q_Jexl8-qDi.
- Charles J Geyer. Practical markov chain monte carlo. Statistical science, 1992.
- Neural message passing for quantum chemistry. In International Conference on Machine Learning, pages 1263–1272. PMLR, 2017.
- Automatic chemical design using a data-driven continuous representation of molecules. ACS central science, 2018.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Constrained bayesian optimization for automatic chemical design using variational autoencoders. Chemical science, 2020.
- Bidirectional molecule generation with recurrent neural networks. Journal of chemical information and modeling, 2020.
- Data-efficient graph grammar learning for molecular generation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=l4IHywGq6a.
- A systematic survey on deep generative models for graph generation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(5):5370–5390, 2022.
- Graph-based molecular representation learning. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI-23, pages 6638–6646, 2023.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Drug discovery with explainable artificial intelligence. Nature Machine Intelligence, 2020.
- Junction tree variational autoencoder for molecular graph generation. In International Conference on Machine Learning, pages 2323–2332. PMLR, 2018.
- Hierarchical generation of molecular graphs using structural motifs. In International Conference on Machine Learning, pages 4839–4848. PMLR, 2020a.
- Multi-objective molecule generation using interpretable substructures. In International Conference on Machine Learning, pages 4849–4859. PMLR, 2020b.
- Score-based generative modeling of graphs via the system of stochastic differential equations. In International Conference on Machine Learning, pages 10362–10383. PMLR, 2022.
- Reinforcement learning: A survey. Journal of artificial intelligence research, 4:237–285, 1996.
- Hiroshi Kajino. Molecular hypergraph grammar with its application to molecular optimization. In International Conference on Machine Learning, pages 3183–3191. PMLR, 2019.
- Conditional molecular design with deep generative models. Journal of chemical information and modeling, 2018.
- Deepgraphmolgen, a multi-objective, computational strategy for generating molecules with desirable properties: a graph convolution and reinforcement learning approach. Journal of Cheminformatics, 2020.
- Artificial intelligence in drug discovery: a comprehensive review of data-driven and machine learning approaches. Biotechnology and Bioprocess Engineering, 2020.
- Auto-encoding variational bayes. stat, 2014.
- Glow: Generative flow with invertible 1x1 convolutions. Advances in neural information processing systems, 31, 2018.
- Variational graph auto-encoders. NIPS Workshop on Bayesian Deep Learning, 2016.
- Molecule generation by principal subgraph mining and assembling. Advances in Neural Information Processing Systems, 35:2550–2563, 2022.
- Gated graph sequence neural networks. In Yoshua Bengio and Yann LeCun, editors, 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, 2016. URL http://arxiv.org/abs/1511.05493.
- Scaffold-based molecular design with a graph generative model. Chemical science, 2020.
- Constrained graph variational autoencoders for molecule design. Advances in neural information processing systems, 31, 2018.
- An autoregressive flow model for 3d molecular geometry generation from scratch. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=C03Ajc-NS5W.
- Graphdf: A discrete flow model for molecular graph generation. In International Conference on Machine Learning, pages 7192–7203. PMLR, 2021.
- Constrained generation of semantically valid graphs via regularizing variational autoencoders. Advances in Neural Information Processing Systems, 31, 2018.
- Graphnvp: An invertible flow model for generating molecular graphs. arXiv:1905.11600, 2019.
- Mol-cyclegan: a generative model for molecular optimization. Journal of Cheminformatics, 2020.
- Learning to extend molecular scaffolds with structural motifs. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=ZTsoE8G3GG.
- Larry R Medsker and LC Jain. Recurrent neural networks. Design and Applications, 5(64-67):2, 2001.
- The monte carlo method. Journal of the American statistical association, 44(247):335–341, 1949.
- Adversarial threshold neural computer for molecular de novo design. Molecular pharmaceutics, 2018.
- A machine learning approach for drug-target interaction prediction using wrapper feature selection and class balancing. Molecular informatics, 2020.
- Generating realistic 3d molecules with an equivariant conditional likelihood model, 2022. URL https://openreview.net/forum?id=Snqhqz4LdK.
- James M Sangster. Octanol-water partition coefficients: fundamentals and physical chemistry, volume 1. John Wiley & Sons, 1997.
- Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015.
- Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer, 2018.
- Patchdock and symmdock: servers for rigid and symmetric docking. Nucleic acids research, 33(suppl_2):W363–W367, 2005.
- Schnet: A continuous-filter convolutional neural network for modeling quantum interactions. Advances in neural information processing systems, 30, 2017.
- Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science, 5(9):1572–1583, 2019.
- Graphaf: a flow-based autoregressive model for molecular graph generation. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1esMkHYPr.
- Graphvae: Towards generation of small graphs using variational autoencoders. In Artificial Neural Networks and Machine Learning–ICANN 2018: 27th International Conference on Artificial Neural Networks, Rhodes, Greece, October 4-7, 2018, Proceedings, Part I 27, pages 412–422. Springer, 2018.
- Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
- Reinforcement learning: An introduction. MIT press, 2018.
- Wasserstein auto-encoders. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=HkL7n1-0b.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Digress: Discrete denoising diffusion for graph generation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=UaAD-Nu86WX.
- David Weininger. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36, 1988.
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8:229–256, 1992.
- Efficient multi-objective molecular optimization in a continuous latent space. Chemical science, 10(34):8016–8024, 2019.
- {MARS}: Markov molecular sampling for multi-objective drug discovery. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=kHSu4ebxFXY.
- Hdock: a web server for protein–protein and protein–dna/rna docking based on a hybrid strategy. Nucleic acids research, 45(W1):W365–W373, 2017.
- Learning substructure invariance for out-of-distribution molecular representations. Advances in Neural Information Processing Systems, 35:12964–12978, 2022.
- Molerec: Combinatorial drug recommendation with substructure-aware molecular representation learning. In Proceedings of the ACM Web Conference 2023, pages 4075–4085, 2023.
- Hit and lead discovery with explorative rl and fragment-based molecule generation. Advances in Neural Information Processing Systems, 34:7924–7936, 2021a.
- Knowledge guided geometric editing for unsupervised drug design. openreview.net, 2021b.
- Graph convolutional policy network for goal-directed molecular graph generation. Advances in neural information processing systems, 31, 2018.
- Graph convolutional networks: a comprehensive review. Computational Social Networks, 2019.
- Deep learning enables rapid identification of potent ddr1 kinase inhibitors. Nature biotechnology, 2019.
- Hao Zhu. Big data and artificial intelligence modeling for drug discovery. Annual review of pharmacology and toxicology, 2020.
- Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pages 2223–2232, 2017.
- Nianzu Yang (7 papers)
- Huaijin Wu (1 paper)
- Kaipeng Zeng (7 papers)
- Yang Li (1142 papers)
- Junchi Yan (241 papers)