Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 153 tok/s

Gemini 2.5 Pro 48 tok/s Pro

GPT-5 Medium 28 tok/s Pro

GPT-5 High 18 tok/s Pro

GPT-4o 100 tok/s Pro

Kimi K2 220 tok/s Pro

GPT OSS 120B 447 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Accelerated materials language processing enabled by GPT (2308.09354v1)

Published 18 Aug 2023 in cs.CL and cond-mat.mtrl-sci

Abstract: Materials language processing (MLP) is one of the key facilitators of materials science research, as it enables the extraction of structured information from massive materials science literature. Prior works suggested high-performance MLP models for text classification, named entity recognition (NER), and extractive question answering (QA), which require complex model architecture, exhaustive fine-tuning and a large number of human-labelled datasets. In this study, we develop generative pretrained transformer (GPT)-enabled pipelines where the complex architectures of prior MLP models are replaced with strategic designs of prompt engineering. First, we develop a GPT-enabled document classification method for screening relevant documents, achieving comparable accuracy and reliability compared to prior models, with only small dataset. Secondly, for NER task, we design an entity-centric prompts, and learning few-shot of them improved the performance on most of entities in three open datasets. Finally, we develop an GPT-enabled extractive QA model, which provides improved performance and shows the possibility of automatically correcting annotations. While our findings confirm the potential of GPT-enabled MLP models as well as their value in terms of reliability and practicability, our scientific methods and systematic approach are applicable to any materials science domain to accelerate the information extraction of scientific literature.

Citations (1)

View on Semantic Scholar