Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Re-Dock: Towards Flexible and Realistic Molecular Docking with Diffusion Bridge (2402.11459v2)

Published 18 Feb 2024 in q-bio.BM, cs.AI, cs.LG, and physics.chem-ph

Abstract: Accurate prediction of protein-ligand binding structures, a task known as molecular docking is crucial for drug design but remains challenging. While deep learning has shown promise, existing methods often depend on holo-protein structures (docked, and not accessible in realistic tasks) or neglect pocket sidechain conformations, leading to limited practical utility and unrealistic conformation predictions. To fill these gaps, we introduce an under-explored task, named flexible docking to predict poses of ligand and pocket sidechains simultaneously and introduce Re-Dock, a novel diffusion bridge generative model extended to geometric manifolds. Specifically, we propose energy-to-geometry mapping inspired by the Newton-Euler equation to co-model the binding energy and conformations for reflecting the energy-constrained docking generative process. Comprehensive experiments on designed benchmark datasets including apo-dock and cross-dock demonstrate our model's superior effectiveness and efficiency over current methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Apobind: a dataset of ligand unbound protein conformations for machine learning applications in de novo drug design. arXiv preprint arXiv:2108.09926, 2021.
  2. Fast, accurate, and reliable molecular docking with quickvina 2. Bioinformatics, 31(13):2214–2216, 2015.
  3. Plantain: Diffusion-inspired pose score minimization for fast and accurate molecular docking. ArXiv, 2023.
  4. Rcsb protein data bank: powerful new tools for exploring 3d structures of biological macromolecules for basic and applied research and education in fundamental biology, biomedicine, biotechnology, bioengineering and energy sciences. Nucleic acids research, 49(D1):D437–D451, 2021.
  5. Posebusters: Ai-based docking methods fail to generate physically valid poses or generalise to novel sequences. Chemical Science, 2024.
  6. Amber 2021. University of California, San Francisco, 2021.
  7. Zdock: an initial-stage protein-docking algorithm. Proteins: Structure, Function, and Bioinformatics, 52(1):80–87, 2003.
  8. Inherent versus induced protein flexibility: comparisons within and between apo and holo structures. PLoS computational biology, 15(1):e1006705, 2019.
  9. Diffdock: Diffusion steps, twists, and turns for molecular docking. International Conference on Learning Representations (ICLR), 2023.
  10. Diffusion schrödinger bridge with applications to score-based generative modeling. In Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W. (eds.), Advances in Neural Information Processing Systems, volume 34, pp.  17695–17709. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper_files/paper/2021/file/940392f5f32a7ade1cc201767cf83e31-Paper.pdf.
  11. Sunflower trypsin inhibitor-1 (sfti-1): sowing seeds in the fields of chemistry and biology. Angewandte Chemie International Edition, 60(15):8050–8071, 2021.
  12. The haddock web server for data-driven biomolecular docking. Nature protocols, 5(5):883–897, 2010.
  13. DeLano, W. L. et al. Pymol: An open-source molecular graphics tool. CCP4 Newsl. Protein Crystallogr, 40(1):82–92, 2002.
  14. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  15. Equivariant flexible modeling of the protein–ligand binding pose with geometric deep learning. Journal of Chemical Theory and Computation, 19(22):8446–8459, 2023.
  16. Autodock vina 1.2.0: New docking methods, expanded force field, and python bindings. Journal of Chemical Information and Modeling, 61(8):3891–3898, 2021. doi: 10.1021/acs.jcim.1c00203. URL https://doi.org/10.1021/acs.jcim.1c00203. PMID: 34278794.
  17. Protein complex prediction with alphafold-multimer. biorxiv, pp.  2021–10, 2021.
  18. Prevention of venous thromboembolism. In Seminars in thrombosis and hemostasis, volume 2, pp.  232–290. Copyright© 1976 by Thieme Medical Publishers, Inc., 1976.
  19. Independent se (3)-equivariant models for end-to-end rigid protein docking. arXiv preprint arXiv:2111.07786, 2021.
  20. e3nn: Euclidean neural networks. arXiv preprint arXiv:2207.09453, 2022.
  21. Linkernet: Fragment poses and linker co-design with 3d equivariant diffusion. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=6EaLIw3W7c.
  22. Glide: a new approach for rapid, accurate docking and scoring. 2. enrichment factors in database screening. Journal of medicinal chemistry, 47(7):1750–1759, 2004.
  23. Docking and scoring with alternative side-chain conformations. Proteins: Structure, Function, and Bioinformatics, 74(3):712–726, 2009.
  24. Simulating diffusion bridges with score matching, 2022.
  25. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  26. Protein language models and structure prediction: Connection and progression, 2022.
  27. Learning complete protein representation by deep coupling of sequence and structure. bioRxiv, 2023. doi: 10.1101/2023.07.05.547769. URL https://www.biorxiv.org/content/early/2023/07/07/2023.07.05.547769.
  28. Protein 3d graph structure learning for robust structure-based protein property prediction. arXiv preprint arXiv:2310.11466, 2023a.
  29. Data-efficient protein 3d geometric pretraining via refinement of diffused protein structure decoy, 2023b.
  30. Jain, A. N. Surflex: fully automatic flexible molecular docking using a molecular similarity-based search engine. Journal of medicinal chemistry, 46(4):499–511, 2003.
  31. Unsupervised protein-ligand binding energy prediction via neural euler’s rotation equation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=46gYakmj4e.
  32. Torsional diffusion for molecular conformer generation. arXiv preprint arXiv:2206.01729, 2022.
  33. Lessons learned in empirical scoring with smina from the csar 2011 benchmarking exercise. Journal of chemical information and modeling, 53(8):1893–1904, 2013.
  34. Conditional antibody design as 3d equivariant graph translation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=LFHFQbjxIiP.
  35. Generalized biomolecular modeling and design with rosettafold all-atom. bioRxiv, pp.  2023–10, 2023.
  36. P2rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure. Journal of cheminformatics, 10:1–12, 2018.
  37. Denoising diffusion probabilistic models on so (3) for rotational alignment. In ICLR 2022 Workshop on Geometrical and Topological Representation Learning, 2022.
  38. Deepdock: enhancing ligand-protein interaction prediction by a combination of ligand and structure information. In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp.  311–317. IEEE, 2019.
  39. Diffbp: Generative diffusion of 3d molecules for target protein binding. arXiv preprint arXiv:2211.11214, 2022a.
  40. Functional-group-based diffusion for pocket-specific molecule generation and elaboration. arXiv preprint arXiv:2306.13769, 2023a.
  41. Non-equispaced fourier neural solvers for pdes, 2023b.
  42. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022:500902, 2022b.
  43. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv, 2022c.
  44. Let us build bridges: Understanding and extending diffusion generative models, 2022.
  45. Forging the basis for developing protein–ligand interaction scoring functions. Accounts of chemical research, 50(2):302–309, 2017.
  46. Tankbind: Trigonometry-aware neural networks for drug-protein binding structure prediction. bioRxiv, 2022. doi: 10.1101/2022.06.06.495043. URL https://www.biorxiv.org/content/early/2022/10/25/2022.06.06.495043.
  47. Gnina 1.0: molecular docking with deep learning. Journal of cheminformatics, 13(1):1–20, 2021.
  48. Deep learning for flexible and site-specific protein docking and design. bioRxiv, pp.  2023–04, 2023.
  49. Directory of useful decoys, enhanced (dud-e): better ligands and decoys for better benchmarking. Journal of medicinal chemistry, 55(14):6582–6594, 2012.
  50. FABind: Fast and accurate protein-ligand binding. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=PnWakgg1RL.
  51. Peluchetti, S. Non-denoising forward-time diffusions. arxiv, 2021.
  52. Diffdock-pocket: Diffusion for pocket-level docking with sidechain flexibility. In NeurIPS 2023 Workshop on New Frontiers of AI for Drug Discovery and Development, 2023.
  53. State-specific protein-ligand complex structure prediction with a multi-scale deep generative model. Preprint at arXiv https://doi. org/10.48550/arXiv, 2209, 2023.
  54. A review on molecular docking as an interpretative tool for molecular targets in disease management. ASSAY and Drug Development Technologies, 22(1):40–50, 2024.
  55. Novel procedure for modeling ligand/receptor induced fit effects. Journal of medicinal chemistry, 49(2):534–553, 2006.
  56. Diffusion schrödinger bridge matching, 2023.
  57. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020.
  58. Equibind: Geometric deep learning for drug binding structure prediction, 2022.
  59. Physics-informed deep neural network for rigid-body protein docking. In MLDD workshop of ICLR 2022, 2022.
  60. Fds: flexible ligand and receptor docking with a continuum solvent model and soft-core energy function. Journal of computational chemistry, 24(13):1637–1656, 2003.
  61. Vakser, I. A. Protein-protein docking: From interaction to interactome. Biophysical journal, 107(8):1785–1793, 2014.
  62. Improved protein–ligand docking using gold. Proteins: Structure, Function, and Bioinformatics, 52(4):609–623, 2003.
  63. Diffusion-based molecule generation with informative prior bridges. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  36533–36545. Curran Associates, Inc., 2022a.
  64. A survey on protein representation learning: Retrospect and prospect, 2022b.
  65. Automated graph self-supervised learning via multi-teacher knowledge distillation, 2022c.
  66. Knowledge distillation improves graph structure augmentation for graph neural networks. In Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., and Oh, A. (eds.), Advances in Neural Information Processing Systems, volume 35, pp.  11815–11827. Curran Associates, Inc., 2022d.
  67. Extracting low-/high- frequency knowledge from graph neural networks and injecting it into mlps: An effective gnn-to-mlp distillation framework, 2023a.
  68. Quantifying the knowledge in gnns for reliable distillation into mlps, 2023b.
  69. Homophily-enhanced self-supervision for graph structure learning: Insights and directions. IEEE Transactions on Neural Networks and Learning Systems, pp.  1–15, 2023c. doi: 10.1109/TNNLS.2023.3257325.
  70. Hdock: a web server for protein–protein and protein–dna/rna docking based on a hybrid strategy. Nucleic acids research, 45(W1):W365–W373, 2017.
  71. Blast: improvements for better sequence analysis. Nucleic acids research, 34(suppl_2):W6–W9, 2006.
  72. Learning on topological surface and geometric structure for 3d molecular generation. Nature Computational Science, 3(10):849–859, 2023a.
  73. Diffpack: A torsional diffusion model for autoregressive protein side-chain packing. arXiv preprint arXiv:2306.01794, 2023b.
  74. Protein representation learning by geometric structure pretraining, 2023c.
  75. Discovery of zap70 inhibitors by high-throughput docking into a conformation of its kinase domain generated by molecular dynamics. Bioorganic & medicinal chemistry letters, 23(20):5721–5726, 2013.
  76. Mmdesign: Multi-modality transfer learning for generative protein design, 2023a.
  77. Lightweight contrastive protein structure-sequence transformation, 2023b.
  78. Uni-mol: A universal 3d molecular representation learning framework. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=6K2RM6wVqKu.
Citations (3)

Summary

  • The paper introduces Re-Dock, a diffusion bridge model that simultaneously predicts ligand and pocket sidechain poses in flexible docking.
  • It employs an energy-to-geometry mapping on non-Euclidean manifolds and autoregressive modeling to realistically simulate induced fit.
  • Benchmarks demonstrate that Re-Dock outperforms traditional methods, offering a promising tool for integration into drug discovery pipelines.

Re-Dock: Advancing Flexibility and Realism in Molecular Docking through a Novel Diffusion Bridge Approach

Introduction

Molecular docking, a pivotal step in drug discovery, predicts how small molecules (ligands) bind to proteins to influence their biological activity. Challenges in this domain include the accurate modeling of protein and ligand flexibility during docking—a process known as induced fit. Traditional approaches and even recent deep learning strategies often rely on rigid, pre-docked structures or oversimplify the problem, ignoring the substantial sidechain flexibility of pocket residues. Addressing these limitations, we propose the innovative Re-Dock framework, a generative model designed for flexible docking. It simulates the protein-ligand docking process more realistically by considering both ligand and pocket sidechain flexibility while incorporating an interaction-aware geometric diffusion bridge model.

Novel Contributions

The Re-Dock framework introduces several key innovations:

  • It targets the flexible docking task, predicting ligand and pocket sidechain poses simultaneously under realistic constraints, which is significant for practical applications in drug discovery.
  • Utilizing a diffusion bridge generative model extended to non-Euclidean manifolds, Re-Dock employs an energy-to-geometry mapping strategy inspired by mechanics principles. This approach enables the co-modeling of binding energy and conformational poses within a unified framework.
  • Benchmarking on specially designed datasets shows Re-Dock's superior performance in accurately predicting flexible docking structures, outperforming current methods in both effectiveness and efficiency.

Methodology

Re-Dock’s novel diffusion bridge is founded on implicit geometric manifolds, incorporating pocket sidechain flexibility into the pose generation process. This capability is crucial for mimicking the real-world induced fit mechanism observed in protein-ligand interactions. By mapping energy to geometry using the Newton-Euler equation, Re-Dock creates a robust model capable of reflecting the energy-constrained generative process of docking. It autoregressively models sidechain distributions, ensuring high-quality pose generation. Comprehensive benchmarks, including traditional flexible re-docking, apo-dock with both crystal and predicted structures, and cross-dock scenarios, validate Re-Dock's approach.

Theoretical Underpinnings and Practical Applications

Re-Dock bridges the gap between theory and application by modeling the complex, dynamic process of molecular docking in a more accurate and pragmatic manner. It leverages knowledge from areas such as rigid body mechanics to inform its approach to drug design, underscoring the interdisciplinary nature of modern computational biology. On a practical level, Re-Dock's success in benchmark tests suggests its potential integration into drug discovery pipelines, offering a more realistic tool for the identification and optimization of novel therapeutics.

Future Directions

While Re-Dock represents a significant step forward in molecular docking, future research could focus on improving the model’s scalability and generalization to unseen proteins. The integration of additional biophysical insights into the diffusion model could further enhance its predictive accuracy. Moreover, extending the Re-Dock framework to encompass more complex biological interactions, such as protein-protein interactions, might broaden its utility across biomedical research.

Conclusion

Re-Dock introduces a groundbreaking approach to the challenge of flexible docking in molecular simulation. By harmonizing principles from mechanics with cutting-edge AI techniques, it provides a compelling solution that enhances the realism and applicability of docking predictions. As drug discovery continues to evolve, tools like Re-Dock will play a crucial role in accelerating the pace of innovation and the discovery of new therapeutic agents.