Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 188 tok/s

Gemini 2.5 Pro 46 tok/s Pro

GPT-5 Medium 37 tok/s Pro

GPT-5 High 34 tok/s Pro

GPT-4o 102 tok/s Pro

Kimi K2 203 tok/s Pro

GPT OSS 120B 457 tok/s Pro

Claude Sonnet 4.5 32 tok/s Pro

2000 character limit reached

Can Large Language Models Understand Molecules? (2402.00024v3)

Published 5 Jan 2024 in q-bio.BM, cs.AI, cs.CL, and cs.LG

Abstract: Purpose: LLMs like GPT (Generative Pre-trained Transformer) from OpenAI and LLaMA (LLM Meta AI) from Meta AI are increasingly recognized for their potential in the field of cheminformatics, particularly in understanding Simplified Molecular Input Line Entry System (SMILES), a standard method for representing chemical structures. These LLMs also have the ability to decode SMILES strings into vector representations. Method: We investigate the performance of GPT and LLaMA compared to pre-trained models on SMILES in embedding SMILES strings on downstream tasks, focusing on two key applications: molecular property prediction and drug-drug interaction prediction. Results: We find that SMILES embeddings generated using LLaMA outperform those from GPT in both molecular property and DDI prediction tasks. Notably, LLaMA-based SMILES embeddings show results comparable to pre-trained models on SMILES in molecular prediction tasks and outperform the pre-trained models for the DDI prediction tasks. Conclusion: The performance of LLMs in generating SMILES embeddings shows great potential for further investigation of these models for molecular embedding. We hope our study bridges the gap between LLMs and molecular embedding, motivating additional research into the potential of LLMs in the molecular representation field. GitHub: https://github.com/sshaghayeghs/LLaMA-VS-GPT

References (58)