Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
9 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Taiyi: A Bilingual Fine-Tuned Large Language Model for Diverse Biomedical Tasks (2311.11608v2)

Published 20 Nov 2023 in cs.CL and cs.AI

Abstract: Objective: Most existing fine-tuned biomedical LLMs focus on enhancing performance in monolingual biomedical question answering and conversation tasks. To investigate the effectiveness of the fine-tuned LLMs on diverse biomedical NLP tasks in different languages, We present Taiyi, a bilingual fine-tuned LLM for diverse biomedical tasks. Materials and Methods: We first curated a comprehensive collection of 140 existing biomedical text mining datasets (102 English and 38 Chinese datasets) across over 10 task types. Subsequently, a two-stage strategy is proposed for supervised fine-tuning to optimize the model performance across varied tasks. Results: Experimental results on 13 test sets covering named entity recognition, relation extraction, text classification, question answering tasks demonstrate that Taiyi achieves superior performance compared to general LLMs. The case study involving additional biomedical NLP tasks further shows Taiyi's considerable potential for bilingual biomedical multi-tasking. Conclusion: Leveraging rich high-quality biomedical corpora and developing effective fine-tuning strategies can significantly improve the performance of LLMs within the biomedical domain. Taiyi shows the bilingual multi-tasking capability through supervised fine-tuning. However, those tasks such as information extraction that are not generation tasks in nature remain challenging for LLM-based generative approaches, and they still underperform the conventional discriminative approaches of smaller LLMs.

Citations (29)

Summary

  • The paper demonstrates that Taiyi, a bilingual LLM fine-tuned on 140 datasets via a two-stage process, outperforms models like ChatGPT 3.5 in varied biomedical tasks.
  • The methodology leverages a dual-stage fine-tuning strategy, first addressing non-generation tasks then generation tasks, to harmonize diverse biomedical schemas.
  • Key results show Taiyi surpasses ChatGPT 3.5 on 11 of 13 datasets, emphasizing its robust multilingual adaptability and potential limitations in specific tasks.

Analysis of "Taiyi: A Bilingual Fine-Tuned LLM for Diverse Biomedical Tasks"

The paper introduces Taiyi, a bilingual LLM optimized proactively for an array of biomedical natural language processing (BioNLP) tasks in both English and Chinese. Unlike many fine-tuned biomedical LLMs that focus on specific monolingual tasks such as biomedical question answering, Taiyi is designed to provide superior performance across a variety of multilingual tasks, a proposition significant for advancing NLP capabilities in the biomedical field.

Methodology

The authors implemented a two-stage supervised fine-tuning process to optimize Taiyi for these tasks. Initially, they selected a diverse set of 140 publicly available biomedical datasets, including 102 English and 38 Chinese datasets spanning over 10 task types. This extensive dataset curation underscores the paper’s commitment to a holistic approach in task coverage. Taiyi’s design incorporates a systematic harmonization of task schemas to manage the diverse formatting of the datasets.

In the fine-tuning process, a two-stage strategy is adopted. The first stage focuses on tasks that are not inherently generation-based, and the second stage involves tasks like QA and dialogues that are generation-based. This bifurcation allows Taiyi to specialize first on certain tasks before generalizing across others in the subsequent phase. The model's architecture is grounded on Qwen-7B, a pre-trained Transformer model with approximately 7 billion parameters, chosen for its moderate size and extensive multilingual data coverage.

Results

The performance evaluation benchmarks Taiyi against baseline models and superlative methods in the field, including ChatGPT 3.5. Results illustrate that Taiyi surpasses ChatGPT 3.5 on 11 of 13 assessed datasets, despite trailing behind state-of-the-art processes in tasks like named entity recognition (NER), relation extraction (RE), and text classification (TC) by approximately 9% on average. Taiyi’s bilingual adaptability is highlighted by its promising output in a variety of BioNLP tasks that were not initially included in the model's training phase.

Discussion and Implications

Taiyi demonstrates considerable robustness and flexibility in multilingual BioNLP tasks, suggesting that comprehensive fine-tuning across varied tasks can yield a performance gain in domain-specific contexts. However, the paper also discusses the limitations inherent in LLMs, such as hallucinations and lack of domain knowledge, indicating potential pitfalls in real-world applications like medical diagnosis. The authors advocate for leveraging additional biomedical resources and improved tuning strategies, pointing toward future work involving knowledge integration for enhanced output reliability and interpretability.

Conclusion

Overall, Taiyi offers a significant contribution to the paradigm of fine-tuned LLMs within the biomedical domain. Its development prompts both a deeper understanding of the capabilities of bilingual LLMs in medical applications and an extension beyond monolingual task specialization. While current limitations indicate areas for future improvement, Taiyi's architecture and methodological framework provide a promising foundation for multilingual NLP tasks in biomedical research. Future explorations could focus on addressing existing challenges such as enhancing task-specific interpretability and ensuring safety in medical applications, particularly by integrating biomedical knowledge databases and retrieval technology.