CoarsenConf: Equivariant Coarsening with Aggregated Attention for Molecular Conformer Generation (2306.14852v2)
Abstract: Molecular conformer generation (MCG) is an important task in cheminformatics and drug discovery. The ability to efficiently generate low-energy 3D structures can avoid expensive quantum mechanical simulations, leading to accelerated virtual screenings and enhanced structural exploration. Several generative models have been developed for MCG, but many struggle to consistently produce high-quality conformers. To address these issues, we introduce CoarsenConf, which coarse-grains molecular graphs based on torsional angles and integrates them into an SE(3)-equivariant hierarchical variational autoencoder. Through equivariant coarse-graining, we aggregate the fine-grained atomic coordinates of subgraphs connected via rotatable bonds, creating a variable-length coarse-grained latent representation. Our model uses a novel aggregated attention mechanism to restore fine-grained coordinates from the coarse-grained latent representation, enabling efficient generation of accurate conformers. Furthermore, we evaluate the chemical and biochemical quality of our generated conformers on multiple downstream applications, including property prediction and oracle-based protein docking. Overall, CoarsenConf generates more accurate conformer ensembles compared to prior generative models.
- Better informed distance geometry: Using what we know to improve conformation generation. Journal of Chemical Information and Modeling, 55(12):2562–2574, 12 2015. doi: 10.1021/acs.jcim.5b00654. URL https://doi.org/10.1021/acs.jcim.5b00654.
- Fragment-based sequential translation for molecular optimization. In NeurIPS 2021 AI for Science Workshop, 2021. URL https://openreview.net/forum?id=E_Slr0JVvuC.
- Coarse graining molecular dynamics with graph neural networks. The Journal of Chemical Physics, 153(19), 11 2020. ISSN 0021-9606. doi: 10.1063/5.0026133. URL https://doi.org/10.1063/5.0026133. 194101.
- Coarse-grained protein models and their applications. Chemical Reviews, 116(14):7898–7936, 07 2016. doi: 10.1021/acs.chemrev.6b00163. URL https://doi.org/10.1021/acs.chemrev.6b00163.
- Chemically transferable generative backmapping of coarse-grained proteins. ArXiv, abs/2303.01569, 2023.
- Generative coarse-graining of molecular conformations. In International Conference on Machine Learning, 2022.
- Geom, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data, 9(1):185, 2022. doi: 10.1038/s41597-022-01288-4. URL https://doi.org/10.1038/s41597-022-01288-4.
- Therapeutics data commons: Machine learning datasets and tasks for drug discovery and development. Proceedings of Neural Information Processing Systems, NeurIPS Datasets and Benchmarks, 2021.
- A generative model for molecular distance geometry. In Proceedings of the 37th International Conference on Machine Learning, pages 8949–8958. PMLR, 2020.
- Direct molecular conformation generation. Transactions on Machine Learning Research, 2022. URL https://openreview.net/forum?id=lCPOHiztuw.
- Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6K2RM6wVqKu.
- Torsional diffusion for molecular conformer generation. In ICLR2022 Machine Learning for Drug Discovery, 2022. URL https://openreview.net/forum?id=D9IxPlXPJJS.
- Automated exploration of the low-energy chemical space with fast quantum chemical methods. Phys. Chem. Chem. Phys., 22:7169–7192, 2020. doi: 10.1039/C9CP06869D. URL http://dx.doi.org/10.1039/C9CP06869D.
- Generative models as an emerging paradigm in the chemical sciences. Journal of the American Chemical Society, 145(16):8736–8750, 2023. doi: 10.1021/jacs.2c13467. URL https://doi.org/10.1021/jacs.2c13467. PMID: 37052978.
- Molecular geometry prediction using a deep generative graph neural network. Scientific reports, 9(1):20381, 2019.
- An end-to-end framework for molecular conformation generation via bilevel programming. In International Conference on Machine Learning, pages 11537–11547. PMLR, 2021a.
- Learning neural generative dynamics for molecular conformation generation. In International Conference on Learning Representations, 2021b. URL https://openreview.net/forum?id=pAbm1qfheGk.
- Learning gradient fields for molecular conformation generation. In International Conference on Machine Learning, pages 9558–9568. PMLR, 2021.
- Predicting molecular conformation via dynamic graph score matching. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 19784–19795. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/a45a1d12ee0fb7f1f872ab91da18f899-Paper.pdf.
- Geodiff: A geometric diffusion model for molecular conformation generation. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=PzcvxEMzvQC.
- Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021. doi: 10.1038/s41586-021-03819-2. URL https://doi.org/10.1038/s41586-021-03819-2.
- Machine learning force fields and coarse-grained variables in molecular dynamics: Application to materials and biological systems. Journal of Chemical Theory and Computation, 16:4757–4775, 2020. doi: 10.1021/acs.jctc.0c00355. URL https://doi.org/10.1021/acs.jctc.0c00355.
- Bottom-up coarse-graining: Principles and perspectives. Journal of Chemical Theory and Computation, 18(10):5759–5791, 10 2022a. doi: 10.1021/acs.jctc.2c00643. URL https://doi.org/10.1021/acs.jctc.2c00643.
- Two for one: Diffusion models and force fields for coarse-grained molecular dynamics, 2023.
- Ensuring thermodynamic consistency with invertible coarse-graining. The Journal of Chemical Physics, 158(12), 2023.
- Geomol: Torsional geometric generation of molecular 3d conformer ensembles. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 13757–13769. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/725215ed82ab6306919b485b81ff9615-Paper.pdf.
- W. Kabsch. Automatic processing of rotation diffraction data from crystals of initially unknown symmetry and cell constants. Journal of Applied Crystallography, 26(6):795–800, Dec 1993. doi: 10.1107/S0021889893005588. URL https://doi.org/10.1107/S0021889893005588.
- beta-VAE: Learning basic visual concepts with a constrained variational framework. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=Sy2fzU9gl.
- Quantum chemical calculations of lithium-ion battery electrolyte and interphase species. Scientific Data, 8(1):203, 2021. doi: 10.1038/s41597-021-00986-9. URL https://doi.org/10.1038/s41597-021-00986-9.
- Architector for high-throughput cross-periodic table 3d complex building. Nature Communications, 14(1):2786, 2023. doi: 10.1038/s41467-023-38169-2. URL https://doi.org/10.1038/s41467-023-38169-2.
- Gfn2-xtb—an accurate and broadly parametrized self-consistent tight-binding quantum chemical method with multipole electrostatics and density-dependent dispersion contributions. Journal of Chemical Theory and Computation, 15(3):1652–1671, 03 2019. doi: 10.1021/acs.jctc.8b01176. URL https://doi.org/10.1021/acs.jctc.8b01176.
- Three-dimensional convolutional neural networks and a cross-docked data set for structure-based drug design. Journal of Chemical Information and Modeling, 60(9):4200–4215, 09 2020. doi: 10.1021/acs.jcim.0c00411. URL https://doi.org/10.1021/acs.jcim.0c00411.
- Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61(8):3891–3898, 08 2021. doi: 10.1021/acs.jcim.1c00203. URL https://doi.org/10.1021/acs.jcim.1c00203.
- 3d equivariant diffusion for target-aware molecule generation and affinity prediction. arXiv preprint arXiv:2303.03543, 2023.
- Pocket2mol: Efficient molecular sampling based on 3d protein pockets. In International Conference on Machine Learning, pages 17644–17655. PMLR, 2022.
- Improving small molecule generation using mutual information machine. In ICLR 2023 - Machine Learning for Drug Discovery workshop, 2023. URL https://openreview.net/forum?id=iOJlwUTUyrN.
- Graphaf: a flow-based autoregressive model for molecular graph generation. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=S1esMkHYPr.
- Hierarchical generation of molecular graphs using structural motifs. In International Conference on Machine Learning, 2020.
- 3dlinker: An e(3) equivariant variational autoencoder for molecular linker design. In International Conference on Machine Learning, 2022.
- Equivariant shape-conditioned generation of 3d molecules for ligand-based drug design. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=4MbGnp4iPQ.
- Antibody-antigen docking and design via hierarchical structure refinement. In International Conference on Machine Learning, pages 10217–10227. PMLR, 2022b.
- Diffdock: Diffusion steps, twists, and turns for molecular docking. arXiv preprint arXiv:2210.01776, 2022.
- E(n) equivariant graph neural networks. In International Conference on Machine Learning, 2021.
- Equivariant message passing for the prediction of tensorial properties and molecular spectra. In International Conference on Machine Learning, 2021.
- Vector neurons: A general framework for so(3)-equivariant networks. 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 12180–12189, 2021.
- Equibind: Geometric deep learning for drug binding structure prediction. In International Conference on Machine Learning, pages 20503–20521. PMLR, 2022.
- Tensor field networks: Rotation-and translation-equivariant neural networks for 3d point clouds. arXiv preprint arXiv:1802.08219, 2018.
- A Learning Algorithm for Continually Running Fully Recurrent Neural Networks. Neural Computation, 1(2):270–280, 06 1989. ISSN 0899-7667. doi: 10.1162/neco.1989.1.2.270. URL https://doi.org/10.1162/neco.1989.1.2.270.
- Conformer generation with omega: Algorithm and validation using high quality structures from the protein databank and cambridge structural database. Journal of Chemical Information and Modeling, 50(4):572–584, 04 2010. doi: 10.1021/ci100031x. URL https://doi.org/10.1021/ci100031x.
- Pot: Python optimal transport. Journal of Machine Learning Research, 22(78):1–8, 2021. URL http://jmlr.org/papers/v22/20-451.html.
- Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise. Journal of Chemical Information and Modeling, 53(8):1893–1904, 08 2013. doi: 10.1021/ci300604z. URL https://doi.org/10.1021/ci300604z.
- Geometrically equivariant graph neural networks: A survey. ArXiv, abs/2202.07230, 2022.
- Danny Reidenbach (8 papers)
- Aditi S. Krishnapriyan (19 papers)