Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
166 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

T-Rex: Text-assisted Retrosynthesis Prediction (2401.14637v1)

Published 26 Jan 2024 in cs.CL

Abstract: As a fundamental task in computational chemistry, retrosynthesis prediction aims to identify a set of reactants to synthesize a target molecule. Existing template-free approaches only consider the graph structures of the target molecule, which often cannot generalize well to rare reaction types and large molecules. Here, we propose T-Rex, a text-assisted retrosynthesis prediction approach that exploits pre-trained text LLMs, such as ChatGPT, to assist the generation of reactants. T-Rex first exploits ChatGPT to generate a description for the target molecule and rank candidate reaction centers based both the description and the molecular graph. It then re-ranks these candidates by querying the descriptions for each reactants and examines which group of reactants can best synthesize the target molecule. We observed that T-Rex substantially outperformed graph-based state-of-the-art approaches on two datasets, indicating the effectiveness of considering text information. We further found that T-Rex outperformed the variant that only use ChatGPT-based description without the re-ranking step, demonstrate how our framework outperformed a straightforward integration of ChatGPT and graph information. Collectively, we show that text generated by pre-trained LLMs can substantially improve retrosynthesis prediction, opening up new avenues for exploiting ChatGPT to advance computational chemistry. And the codes can be found at https://github.com/lauyikfung/T-Rex.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (46)
  1. Pubchempy: https://pubchempy.readthedocs.io/.
  2. The sciqa scientific question answering benchmark for scholarly knowledge. Scientific Reports, 13(1):7240.
  3. Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification. Journal of chemical information and modeling, 59(2):673–688.
  4. Generalization in nli: Ways (not) to go beyond simple heuristics.
  5. Sebastian Bordt and Ulrike von Luxburg. 2023. Chatgpt participates in a computer science exam. arXiv preprint arXiv:2303.09461.
  6. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
  7. Cayque Monteiro Castro Nascimento and André Silva Pimentel. 2023. Do large language models understand chemistry? a conversation with chatgpt. Journal of Chemical Information and Modeling, 63(6):1649–1655.
  8. Shuan Chen and Yousung Jung. 2021. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au, 1(10):1612–1620.
  9. Prediction of organic reaction outcomes using machine learning. ACS central science, 3(5):434–443.
  10. Machine learning in computer-aided synthesis planning. Accounts of chemical research, 51(5):1281–1289.
  11. Computer-assisted retrosynthesis based on molecular similarity. ACS central science, 3(12):1237–1245.
  12. EJ Corey. 1988. Robert robinson lecture. retrosynthetic thinking—essentials and examples. Chemical society reviews, 17:111–133.
  13. Retrosynthesis prediction with conditional graph logic network. Advances in Neural Information Processing Systems, 32.
  14. Chatgpt or academic scientist? distinguishing authorship with over 99% accuracy using off-the-shelf machine learning tools. arXiv preprint arXiv:2303.16352.
  15. Translation between molecules and natural language. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 375–413, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  16. Domain-specific language model pretraining for biomedical natural language processing.
  17. A collection of robust organic synthesis reactions for in silico molecule design. Journal of chemical information and modeling, 51(12):3093–3098.
  18. Zijian Hong. 2023. Chatgpt for computational materials science: A perspective. Energy Material Advances, 4:0026.
  19. Artificial intelligence for retrosynthesis prediction. Engineering.
  20. Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in neural information processing systems, 30.
  21. The capability of chatgpt in predicting and explaining common drug-drug interactions. Cureus, 15(3).
  22. A transformer model for retrosynthesis. In Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pages 817–830. Springer.
  23. Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  24. Greg Landrum et al. 2016. Rdkit: Open-source cheminformatics software.
  25. Route designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation. Journal of chemical information and modeling, 49(3):593–602.
  26. Chatgpt: A meta-analysis after 2.5 months. arXiv preprint arXiv:2302.13795.
  27. Drugchat: Towards enabling chatgpt-like capabilities on drug molecule graphs.
  28. Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science, 3(10):1103–1113.
  29. Designing chemical reaction arrays using phactor and chatgpt.
  30. Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling, 61(7):3273–3284.
  31. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer.
  32. Planning chemical syntheses with deep neural networks and symbolic ai. Nature, 555(7698):604–610.
  33. Marwin HS Segler and Mark P Waller. 2017. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal, 23(25):5966–5971.
  34. Gaurav Sharma and Abhishek Thakur. 2023. Chatgpt in drug discovery.
  35. A graph to graphs framework for retrosynthesis prediction. In International conference on machine learning, pages 8818–8827. PMLR.
  36. Sai Cheong Siu. 2023. Chatgpt and gpt-4 for professional translators: Exploring the potential of large language models in translation. Available at SSRN 4448091.
  37. Learning graph models for template-free retrosynthesis. arXiv preprint arXiv:2006.07038.
  38. Machine learning the ropes: principles, applications and directions in synthetic chemistry. Chemical Society Reviews, 49(17):6154–6168.
  39. Energy-based view of retrosynthesis. arXiv preprint arXiv:2007.13437.
  40. Computer-assisted synthetic planning: the end of the beginning. Angewandte Chemie International Edition, 55(20):5904–5937.
  41. Zhengkai Tu and Connor W Coley. 2022. Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling, 62(15):3503–3513.
  42. Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR, abs/1908.08962.
  43. Kadir Uludag. 2023. The use of ai-supported chatbot in psychology. Available at SSRN 4331367.
  44. David Weininger. 1988. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36.
  45. Retroxpert: Decompose retrosynthesis prediction like a chemist. Advances in Neural Information Processing Systems, 33:11248–11258.
  46. 𝒪𝒪\mathcal{O}caligraphic_O-gnn: incorporating ring priors into molecular modeling. In The Eleventh International Conference on Learning Representations.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

Github Logo Streamline Icon: https://streamlinehq.com
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets