T-Rex: Text-assisted Retrosynthesis Prediction (2401.14637v1)
Abstract: As a fundamental task in computational chemistry, retrosynthesis prediction aims to identify a set of reactants to synthesize a target molecule. Existing template-free approaches only consider the graph structures of the target molecule, which often cannot generalize well to rare reaction types and large molecules. Here, we propose T-Rex, a text-assisted retrosynthesis prediction approach that exploits pre-trained text LLMs, such as ChatGPT, to assist the generation of reactants. T-Rex first exploits ChatGPT to generate a description for the target molecule and rank candidate reaction centers based both the description and the molecular graph. It then re-ranks these candidates by querying the descriptions for each reactants and examines which group of reactants can best synthesize the target molecule. We observed that T-Rex substantially outperformed graph-based state-of-the-art approaches on two datasets, indicating the effectiveness of considering text information. We further found that T-Rex outperformed the variant that only use ChatGPT-based description without the re-ranking step, demonstrate how our framework outperformed a straightforward integration of ChatGPT and graph information. Collectively, we show that text generated by pre-trained LLMs can substantially improve retrosynthesis prediction, opening up new avenues for exploiting ChatGPT to advance computational chemistry. And the codes can be found at https://github.com/lauyikfung/T-Rex.
- Pubchempy: https://pubchempy.readthedocs.io/.
- The sciqa scientific question answering benchmark for scholarly knowledge. Scientific Reports, 13(1):7240.
- Enhancing retrosynthetic reaction prediction with deep learning using multiscale reaction classification. Journal of chemical information and modeling, 59(2):673–688.
- Generalization in nli: Ways (not) to go beyond simple heuristics.
- Sebastian Bordt and Ulrike von Luxburg. 2023. Chatgpt participates in a computer science exam. arXiv preprint arXiv:2303.09461.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712.
- Cayque Monteiro Castro Nascimento and André Silva Pimentel. 2023. Do large language models understand chemistry? a conversation with chatgpt. Journal of Chemical Information and Modeling, 63(6):1649–1655.
- Shuan Chen and Yousung Jung. 2021. Deep retrosynthetic reaction prediction using local reactivity and global attention. JACS Au, 1(10):1612–1620.
- Prediction of organic reaction outcomes using machine learning. ACS central science, 3(5):434–443.
- Machine learning in computer-aided synthesis planning. Accounts of chemical research, 51(5):1281–1289.
- Computer-assisted retrosynthesis based on molecular similarity. ACS central science, 3(12):1237–1245.
- EJ Corey. 1988. Robert robinson lecture. retrosynthetic thinking—essentials and examples. Chemical society reviews, 17:111–133.
- Retrosynthesis prediction with conditional graph logic network. Advances in Neural Information Processing Systems, 32.
- Chatgpt or academic scientist? distinguishing authorship with over 99% accuracy using off-the-shelf machine learning tools. arXiv preprint arXiv:2303.16352.
- Translation between molecules and natural language. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 375–413, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Domain-specific language model pretraining for biomedical natural language processing.
- A collection of robust organic synthesis reactions for in silico molecule design. Journal of chemical information and modeling, 51(12):3093–3098.
- Zijian Hong. 2023. Chatgpt for computational materials science: A perspective. Energy Material Advances, 4:0026.
- Artificial intelligence for retrosynthesis prediction. Engineering.
- Predicting organic reaction outcomes with weisfeiler-lehman network. Advances in neural information processing systems, 30.
- The capability of chatgpt in predicting and explaining common drug-drug interactions. Cureus, 15(3).
- A transformer model for retrosynthesis. In Artificial Neural Networks and Machine Learning–ICANN 2019: Workshop and Special Sessions: 28th International Conference on Artificial Neural Networks, Munich, Germany, September 17–19, 2019, Proceedings, pages 817–830. Springer.
- Diederik P Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- Greg Landrum et al. 2016. Rdkit: Open-source cheminformatics software.
- Route designer: a retrosynthetic analysis tool utilizing automated retrosynthetic rule generation. Journal of chemical information and modeling, 49(3):593–602.
- Chatgpt: A meta-analysis after 2.5 months. arXiv preprint arXiv:2302.13795.
- Drugchat: Towards enabling chatgpt-like capabilities on drug molecule graphs.
- Retrosynthetic reaction prediction using neural sequence-to-sequence models. ACS central science, 3(10):1103–1113.
- Designing chemical reaction arrays using phactor and chatgpt.
- Molecule edit graph attention network: modeling chemical reactions as sequences of graph edits. Journal of Chemical Information and Modeling, 61(7):3273–3284.
- Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15, pages 593–607. Springer.
- Planning chemical syntheses with deep neural networks and symbolic ai. Nature, 555(7698):604–610.
- Marwin HS Segler and Mark P Waller. 2017. Neural-symbolic machine learning for retrosynthesis and reaction prediction. Chemistry–A European Journal, 23(25):5966–5971.
- Gaurav Sharma and Abhishek Thakur. 2023. Chatgpt in drug discovery.
- A graph to graphs framework for retrosynthesis prediction. In International conference on machine learning, pages 8818–8827. PMLR.
- Sai Cheong Siu. 2023. Chatgpt and gpt-4 for professional translators: Exploring the potential of large language models in translation. Available at SSRN 4448091.
- Learning graph models for template-free retrosynthesis. arXiv preprint arXiv:2006.07038.
- Machine learning the ropes: principles, applications and directions in synthetic chemistry. Chemical Society Reviews, 49(17):6154–6168.
- Energy-based view of retrosynthesis. arXiv preprint arXiv:2007.13437.
- Computer-assisted synthetic planning: the end of the beginning. Angewandte Chemie International Edition, 55(20):5904–5937.
- Zhengkai Tu and Connor W Coley. 2022. Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling, 62(15):3503–3513.
- Well-read students learn better: The impact of student initialization on knowledge distillation. CoRR, abs/1908.08962.
- Kadir Uludag. 2023. The use of ai-supported chatbot in psychology. Available at SSRN 4331367.
- David Weininger. 1988. Smiles, a chemical language and information system. 1. introduction to methodology and encoding rules. Journal of chemical information and computer sciences, 28(1):31–36.
- Retroxpert: Decompose retrosynthesis prediction like a chemist. Advances in Neural Information Processing Systems, 33:11248–11258.
- 𝒪𝒪\mathcal{O}caligraphic_O-gnn: incorporating ring priors into molecular modeling. In The Eleventh International Conference on Learning Representations.