Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
124 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SurfPro: Functional Protein Design Based on Continuous Surface (2405.06693v2)

Published 7 May 2024 in q-bio.BM and cs.LG

Abstract: How can we design proteins with desired functions? We are motivated by a chemical intuition that both geometric structure and biochemical properties are critical to a protein's function. In this paper, we propose SurfPro, a new method to generate functional proteins given a desired surface and its associated biochemical properties. SurfPro comprises a hierarchical encoder that progressively models the geometric shape and biochemical features of a protein surface, and an autoregressive decoder to produce an amino acid sequence. We evaluate SurfPro on a standard inverse folding benchmark CATH 4.2 and two functional protein design tasks: protein binder design and enzyme design. Our SurfPro consistently surpasses previous state-of-the-art inverse folding methods, achieving a recovery rate of 57.78% on CATH 4.2 and higher success rates in terms of protein-protein binding and enzyme-substrate interaction scores.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Point set surfaces. In Proceedings Visualization, 2001. VIS’01., pp.  21–29. IEEE, 2001.
  2. Model-based reinforcement learning for biological sequence design. In International conference on learning representations, 2019.
  3. De novo protein design by deep network hallucination. Nature, 600(7889):547–552, 2021.
  4. Arnold, F. H. Design by directed evolution. Accounts of chemical research, 31(3):125–131, 1998.
  5. Arnold, F. H. Directed evolution: bringing new chemistry to life. Angewandte Chemie International Edition, 57(16):4143–4148, 2018.
  6. Improving de novo protein binder design with deep learning. Nature Communications, 14(1):2625, 2023.
  7. Conditioning by adaptive sampling for robust design. In International conference on machine learning, pp.  773–782. PMLR, 2019.
  8. Design by adaptive sampling. arXiv preprint arXiv:1810.03714, 2018.
  9. Connolly, M. L. Solvent-accessible surfaces of proteins and nucleic acids. Science, 221(4612):709–713, 1983.
  10. Dalby, P. A. Strategy and success for the directed evolution of enzymes. Current opinion in structural biology, 21(4):473–480, 2011.
  11. Robust deep learning–based protein sequence design using proteinmpnn. Science, 378(6615):49–56, 2022.
  12. Protein design with deep learning. International Journal of Molecular Sciences, 22(21):11741, 2021.
  13. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural networks, 107:3–11, 2018.
  14. Msms: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics, 26(16):2064–2065, 2010.
  15. Protgpt2 is a deep unsupervised language model for protein design. Nature communications, 13(1):4348, 2022.
  16. Rosettascripts: a scripting language interface to the rosetta macromolecular modeling suite. PloS one, 6(6):e20161, 2011.
  17. Deciphering interaction fingerprints from protein molecular surfaces using geometric deep learning. Nature Methods, 17(2):184–192, 2020.
  18. De novo design of protein interactions with learned surface fingerprints. Nature, pp.  1–9, 2023.
  19. Pifold: Toward effective and efficient protein inverse folding. arXiv preprint arXiv:2209.12643, 2022.
  20. Learning inverse folding from millions of predicted structures. In International Conference on Machine Learning, pp.  8946–8970. PMLR, 2022.
  21. The coming of age of de novo protein design. Nature, 537(7620):320–327, 2016.
  22. Generative models for graph-based protein design. Advances in neural information processing systems, 32, 2019.
  23. Biological sequence design with gflownets. In International Conference on Machine Learning, pp.  9786–9801. PMLR, 2022.
  24. Learning from protein structure with geometric vector perceptrons. In International Conference on Learning Representations, 2020.
  25. Highly accurate protein structure prediction with alphafold. Nature, 596(7873):583–589, 2021.
  26. Adam: A method for stochastic optimization. 2014.
  27. A general model to predict small molecule substrates of enzymes based on machine and deep learning. Nature Communications, 14(1):2787, 2023a.
  28. Turnover number predictions for kinetically uncharacterized enzymes using machine and deep learning. Nature Communications, 14(1):4139, 2023b.
  29. Model inversion networks for model-based optimization. Advances in Neural Information Processing Systems, 33:5126–5137, 2020.
  30. Levin, D. The approximation power of moving least-squares. Mathematics of computation, 67(224):1517–1531, 1998.
  31. Levin, D. Mesh-independent surface interpolation. In Geometric modeling for scientific visualization, pp.  37–49. Springer, 2004.
  32. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, 2023.
  33. Voxel structure-based mesh reconstruction from a 3d point cloud. IEEE Transactions on Multimedia, 24:1815–1829, 2021.
  34. Large language models generate functional protein sequences across diverse families. Nature Biotechnology, pp.  1–8, 2023.
  35. Language models enable zero-shot prediction of the effects of mutations on protein function. Advances in Neural Information Processing Systems, 34:29287–29303, 2021.
  36. Boss: Bayesian optimization over string spaces. Advances in neural information processing systems, 33:15476–15486, 2020.
  37. Methods for the directed evolution of proteins. Nature Reviews Genetics, 16(7):379–394, 2015.
  38. Frame averaging for invariant and equivariant network design. In International Conference on Learning Representations, 2021.
  39. Proximal exploration for model-guided protein sequence design. bioRxiv, 2022.
  40. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118(15):e2016239118, 2021.
  41. Reduced surface: an efficient way to compute molecular surfaces. Biopolymers, 38(3):305–320, 1996.
  42. E (n) equivariant graph neural networks. In International conference on machine learning, pp.  9323–9332. PMLR, 2021.
  43. Octree-based point-cloud compression. PBG@ SIGGRAPH, 3, 2006.
  44. Importance weighted expectation-maximization for protein sequence design. arXiv preprint arXiv:2305.00386, 2023.
  45. Fast end-to-end learning on protein surfaces. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  15272–15281, 2021.
  46. Black-box optimization for automated discovery. Accounts of Chemical Research, 54(6):1334–1346, 2021.
  47. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  48. Scaffolding protein functional sites using deep learning. Science, 377(6604):387–394, 2022.
  49. De novo design of protein structure and function with rfdiffusion. Nature, 620(7976):1089–1100, 2023.
  50. De novo design of luciferases using deep learning. Nature, 614(7949):774–780, 2023.
  51. Structure-informed language models are protein designers. bioRxiv, pp.  2023–02, 2023.
Citations (1)

Summary

We haven't generated a summary for this paper yet.