A Text-guided Protein Design Framework (2302.04611v3)

Published 9 Feb 2023 in cs.LG, cs.AI, q-bio.QM, and stat.ML

Abstract: Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90\% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.

References (120)

Citations (55)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/Ryansikorski10/status/1905685907941867583

https://twitter.com/1312627190983348224/status/1737531238523838521

https://twitter.com/XTXI/status/1879324149559992504

A Text-guided Protein Design Framework (2302.04611v3)

Summary

Related Papers

Tweets