Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge (2402.11459v2)
Abstract: Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation predictions. To fill these gaps, we introduce an under-explored task, named flexible docking to predict poses of ligand and pocket sidechains simultaneously and introduce Re-Dock, a novel diffusion bridge generative model extended to geometric manifolds. Specifically, we propose energy-to-geometry mapping inspired by the Newton-Euler equation to co-model the binding energy and conformations for reflecting the energy-constrained docking generative process. Comprehensive experiments on designed benchmark datasets including apo-dock and cross-dock demonstrate our model's superior effectiveness and efficiency over current methods.
- Apobind: a dataset of ligand unbound protein conformations for machine learning applications in de novo drug design. arXiv preprint arXiv:2108.09926, 2021.
- Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214–2216, 2015.
- Plantain: Diffusion-inspired pose score minimization for fast and accurate molecular docking. ArXiv, 2023.
- Rcsb protein data bank: powerful new tools for exploring 3d structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic acids research, 49(D1):D437–D451, 2021.
- Posebusters: Ai-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chemical Science, 2024.
- Amber 2021. University of California, San Francisco, 2021.
- Zdock: an initial-stage protein-docking algorithm. Proteins: Structure, Function, and Bioinformatics, 52(1):80–87, 2003.
- Inherent versus induced protein flexibility: comparisons within and between apo and holo structures. PLoS computational biology, 15(1):e1006705, 2019.
- Diffdock: Diffusion steps, twists, and turns for molecular docking. International Conference on Learning Representations (ICLR), 2023.
- Diffusion schrödinger bridge with applications to score-based generative modeling. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp. 17695–17709. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/940392f5f32a7ade1cc201767cf83e31-Paper.pdf.
- Sunflower trypsin inhibitor-1 (sfti-1): sowing seeds in the fields of chemistry and biology. Angewandte Chemie International Edition, 60(15):8050–8071, 2021.
- The haddock web server for data-driven biomolecular docking. Nature protocols, 5(5):883–897, 2010.
- DeLano, W. L. et al. Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr, 40(1):82–92, 2002.
- Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
- Equivariant flexible modeling of the protein–ligand binding pose with geometric deep learning. Journal of Chemical Theory and Computation, 19(22):8446–8459, 2023.
- Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61(8):3891–3898, 2021. doi: 10.1021/acs.jcim.1c00203. URL https://doi.org/10.1021/acs.jcim.1c00203. PMID: 34278794.
- Protein complex prediction with alphafold-multimer. biorxiv, pp. 2021–10, 2021.
- Prevention of venous thromboembolism. In Seminars in thrombosis and hemostasis, volume 2, pp. 232–290. Copyright© 1976 by Thieme Medical Publishers, Inc., 1976.
- Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786, 2021.
- e3nn: Euclidean neural networks. arXiv preprint arXiv:2207.09453, 2022.
- Linkernet: Fragment poses and linker co-design with 3d equivariant diffusion. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=6EaLIw3W7c.
- Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. Journal of medicinal chemistry, 47(7):1750–1759, 2004.
- Docking and scoring with alternative side-chain conformations. Proteins: Structure, Function, and Bioinformatics, 74(3):712–726, 2009.
- Simulating diffusion bridges with score matching, 2022.
- Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
- Protein language models and structure prediction: Connection and progression, 2022.
- Learning complete protein representation by deep coupling of sequence and structure. bioRxiv, 2023. doi: 10.1101/2023.07.05.547769. URL https://www.biorxiv.org/content/early/2023/07/07/2023.07.05.547769.
- Protein 3d graph structure learning for robust structure-based protein property prediction. arXiv preprint arXiv:2310.11466, 2023a.
- Data-efficient protein 3d geometric pretraining via refinement of diffused protein structure decoy, 2023b.
- Jain, A. N. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. Journal of medicinal chemistry, 46(4):499–511, 2003.
- Unsupervised protein-ligand binding energy prediction via neural euler’s rotation equation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=46gYakmj4e.
- Torsional diffusion for molecular conformer generation. arXiv preprint arXiv:2206.01729, 2022.
- Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise. Journal of chemical information and modeling, 53(8):1893–1904, 2013.
- Conditional antibody design as 3d equivariant graph translation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=LFHFQbjxIiP.
- Generalized biomolecular modeling and design with rosettafold all-atom. bioRxiv, pp. 2023–10, 2023.
- P2rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. Journal of cheminformatics, 10:1–12, 2018.
- Denoising diffusion probabilistic models on so (3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
- Deepdock: enhancing ligand-protein interaction prediction by a combination of ligand and structure information. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 311–317. IEEE, 2019.
- Diffbp: Generative diffusion of 3d molecules for target protein binding. arXiv preprint arXiv:2211.11214, 2022a.
- Functional-group-based diffusion for pocket-specific molecule generation and elaboration. arXiv preprint arXiv:2306.13769, 2023a.
- Non-equispaced fourier neural solvers for pdes, 2023b.
- Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022:500902, 2022b.
- Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022c.
- Let us build bridges: Understanding and extending diffusion generative models, 2022.
- Forging the basis for developing protein–ligand interaction scoring functions. Accounts of chemical research, 50(2):302–309, 2017.
- Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. bioRxiv, 2022. doi: 10.1101/2022.06.06.495043. URL https://www.biorxiv.org/content/early/2022/10/25/2022.06.06.495043.
- Gnina 1.0: molecular docking with deep learning. Journal of cheminformatics, 13(1):1–20, 2021.
- Deep learning for flexible and site-specific protein docking and design. bioRxiv, pp. 2023–04, 2023.
- Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking. Journal of medicinal chemistry, 55(14):6582–6594, 2012.
- FABind: Fast and accurate protein-ligand binding. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=PnWakgg1RL.
- Peluchetti, S. Non-denoising forward-time diffusions. arxiv, 2021.
- Diffdock-pocket: Diffusion for pocket-level docking with sidechain flexibility. In NeurIPS 2023 Workshop on New Frontiers of AI for Drug Discovery and Development, 2023.
- State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Preprint at arXiv https://doi. org/10.48550/arXiv, 2209, 2023.
- A review on molecular docking as an interpretative tool for molecular targets in disease management. ASSAY and Drug Development Technologies, 22(1):40–50, 2024.
- Novel procedure for modeling ligand/receptor induced fit effects. Journal of medicinal chemistry, 49(2):534–553, 2006.
- Diffusion schrödinger bridge matching, 2023.
- Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
- Equibind: Geometric deep learning for drug binding structure prediction, 2022.
- Physics-informed deep neural network for rigid-body protein docking. In MLDD workshop of ICLR 2022, 2022.
- Fds: flexible ligand and receptor docking with a continuum solvent model and soft-core energy function. Journal of computational chemistry, 24(13):1637–1656, 2003.
- Vakser, I. A. Protein-protein docking: From interaction to interactome. Biophysical journal, 107(8):1785–1793, 2014.
- Improved protein–ligand docking using gold. Proteins: Structure, Function, and Bioinformatics, 52(4):609–623, 2003.
- Diffusion-based molecule generation with informative prior bridges. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 36533–36545. Curran Associates, Inc., 2022a.
- A survey on protein representation learning: Retrospect and prospect, 2022b.
- Automated graph self-supervised learning via multi-teacher knowledge distillation, 2022c.
- Knowledge distillation improves graph structure augmentation for graph neural networks. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp. 11815–11827. Curran Associates, Inc., 2022d.
- Extracting low-/high- frequency knowledge from graph neural networks and injecting it into mlps: An effective gnn-to-mlp distillation framework, 2023a.
- Quantifying the knowledge in gnns for reliable distillation into mlps, 2023b.
- Homophily-enhanced self-supervision for graph structure learning: Insights and directions. IEEE Transactions on Neural Networks and Learning Systems, pp. 1–15, 2023c. doi: 10.1109/TNNLS.2023.3257325.
- Hdock: a web server for protein–protein and protein–dna/rna docking based on a hybrid strategy. Nucleic acids research, 45(W1):W365–W373, 2017.
- Blast: improvements for better sequence analysis. Nucleic acids research, 34(suppl_2):W6–W9, 2006.
- Learning on topological surface and geometric structure for 3d molecular generation. Nature Computational Science, 3(10):849–859, 2023a.
- Diffpack: A torsional diffusion model for autoregressive protein side-chain packing. arXiv preprint arXiv:2306.01794, 2023b.
- Protein representation learning by geometric structure pretraining, 2023c.
- Discovery of zap70 inhibitors by high-throughput docking into a conformation of its kinase domain generated by molecular dynamics. Bioorganic & medicinal chemistry letters, 23(20):5721–5726, 2013.
- Mmdesign: Multi-modality transfer learning for generative protein design, 2023a.
- Lightweight contrastive protein structure-sequence transformation, 2023b.
- Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6K2RM6wVqKu.