Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

125 tokens/sec

GPT-4o

53 tokens/sec

Gemini 2.5 Pro Pro

42 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

47 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

How Important is Domain Specificity in Language Models and Instruction Finetuning for Biomedical Relation Extraction? (2402.13470v1)

Published 21 Feb 2024 in cs.CL

Abstract: Cutting edge techniques developed in the general NLP domain are often subsequently applied to the high-value, data-rich biomedical domain. The past few years have seen generative LLMs (LMs), instruction finetuning, and few-shot learning become foci of NLP research. As such, generative LMs pretrained on biomedical corpora have proliferated and biomedical instruction finetuning has been attempted as well, all with the hope that domain specificity improves performance on downstream tasks. Given the nontrivial effort in training such models, we investigate what, if any, benefits they have in the key biomedical NLP task of relation extraction. Specifically, we address two questions: (1) Do LMs trained on biomedical corpora outperform those trained on general domain corpora? (2) Do models instruction finetuned on biomedical datasets outperform those finetuned on assorted datasets or those simply pretrained? We tackle these questions using existing LMs, testing across four datasets. In a surprising result, general-domain models typically outperformed biomedical-domain models. However, biomedical instruction finetuning improved performance to a similar degree as general instruction finetuning, despite having orders of magnitude fewer instructions. Our findings suggest it may be more fruitful to focus research effort on larger-scale biomedical instruction finetuning of general LMs over building domain-specific biomedical LMs

References (60)

Authors (2)

Aviv Brokman (3 papers)
Ramakanth Kavuluru (23 papers)

Citations (2)

View on Semantic Scholar

Summary

The paper shows that general-domain language models often outperform biomedical-specific counterparts on relation extraction tasks.
It applies generative models and instruction finetuning, converting relation extraction instances into natural language sequences for evaluation.
The study highlights that tailored biomedical instruction finetuning can achieve strong performance with fewer domain-specific instructions.

Evaluating the Impact of Domain Specificity in LLMs for Biomedical Relation Extraction

Introduction to the Study

The intersection of generative LLMs (LMs) and the biomedical domain represents a fertile ground for enhancing tasks such as relation extraction (RE), a pivotal component in biomedical knowledge discovery. In an effort to investigate the necessity and effectiveness of domain-specificity in LMs and instruction finetuning (IFT) for biomedical RE, this paper explores two pivotal questions. Firstly, it assesses whether LMs pretrained on biomedical corpora exhibit superior performance over those trained on general-domain corpora. Secondly, it examines how models that have undergone IFT on biomedical datasets fare against those fine-tuned on more diverse datasets or those that have merely been pretrained. These inquiries are pursued through the lens of several existing LMs and tested across four biomedical RE datasets.

Biomedical Relation Extraction and LLMs

Relation extraction involves identifying semantic relationships between entities within a text, a process critical for constructing knowledge graphs and supporting various biomedical applications. Traditionally, RE and Named Entity Recognition (NER) tasks were accomplished using encoder models; however, generative models have shown promise in handling these tasks more flexibly through natural language prompts, particularly in few-shot learning scenarios. Concurrently, instruction finetuning has emerged as a method to align generative LMs towards specific task objectives, potentially enhancing their performance across various datasets.

Investigation and Methodology

The paper harnessed a selection of biomedical and general-domain LMs, including but not limited to variants of BART, T5, GPT-2, and BioGPT, alongside instruction-finetuned models like Flan-T5 and In-BoXBART. These models were evaluated in both full finetuning and few-shot settings across datasets such as CDR and ChemProt, which encompass diverse biomedical relations. Conversion of RE instances into natural language sequences facilitated the finetuning of generative LMs for the RE task at hand.

Key Findings

Surprisingly, the investigation revealed that general-domain models typically outperformed their biomedical-domain counterparts across most datasets and settings. However, models that underwent biomedical IFT showed performance improvements comparable to those achieved through general domain IFT, despite significantly fewer instructions. These findings prompt a reconsideration of the prevailing assumption that domain-specific pretraining universally yields better models for specialized tasks like biomedical RE.

Theoretical and Practical Implications

The results suggest that the advantages of domain-specific pretraining for RE tasks might be outweighed by the benefits derived from the broader, more diverse linguistic representations captured by general-domain LMs. Notably, the effective application of IFT, even with a limited set of biomedical instructions, underscores the potential of tailored model tuning over the development of domain-specific models from scratch. These insights advocate for a strategic pivot towards leveraging and refining existing general-domain LMs through targeted instruction finetuning, optimizing the balance between model performance and the resource-intensive process of model development.

Future Directions

This research opens avenues for further exploration beyond biomedical RE, encouraging the examination of domain specificity and IFT's impact across different fields and tasks. Moreover, expanding the scale and scope of biomedical IFT, potentially harnessing larger biomedical metadatasets, could unearth further enhancements in model performance. While the findings predominantly pertain to RE tasks, their implications could inform broader strategies in AI application development within and beyond the biomedical domain.

Conclusion

The nuanced approach of this paper, exploring the intricate dynamics between domain-specific pretraining, IFT, and RE performance, provides a foundational understanding for future AI research and development strategies. As the field evolves, continuous reassessment of these methodologies will be essential in harnessing the full potential of LMs across diverse knowledge domains.

PDF Markdown

Tweets

https://twitter.com/BioNLProc/status/1761161751457075359