AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets (2401.01916v2)

Published 3 Jan 2024 in astro-ph.IM, astro-ph.CO, astro-ph.GA, astro-ph.SR, cs.CL, and cs.LG

Abstract: We explore the potential of enhancing LLM performance in astronomy-focused question-answering through targeted, continual pre-training. By employing a compact 7B-parameter LLaMA-2 model and focusing exclusively on a curated set of astronomy corpora -- comprising abstracts, introductions, and conclusions -- we achieve notable improvements in specialized topic comprehension. While general LLMs like GPT-4 excel in broader question-answering scenarios due to superior reasoning capabilities, our findings suggest that continual pre-training with limited resources can still enhance model performance on specialized topics. Additionally, we present an extension of AstroLLaMA: the fine-tuning of the 7B LLaMA model on a domain-specific conversational dataset, culminating in the release of the chat-enabled AstroLLaMA for community use. Comprehensive quantitative benchmarking is currently in progress and will be detailed in an upcoming full paper. The model, AstroLLaMA-Chat, is now available at https://huggingface.co/universeTBD, providing the first open-source conversational AI tool tailored for the astronomy community.

References (4)

Citations (10)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/joshnguyen99/status/1743131245205193159

https://twitter.com/errai34/status/1743138315380076902

https://twitter.com/jrdmb/status/1743309850220912641

https://twitter.com/gastronomy/status/1743429721747038555

AstroLLaMA-Chat: Scaling AstroLLaMA with Conversational and Diverse Datasets (2401.01916v2)

Summary

Related Papers

Tweets