Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases (2403.10446v1)
Abstract: We proposed an end-to-end system design towards utilizing Retrieval Augmented Generation (RAG) to improve the factual accuracy of LLMs for domain-specific and time-sensitive queries related to private knowledge-bases. Our system integrates RAG pipeline with upstream datasets processing and downstream performance evaluation. Addressing the challenge of LLM hallucinations, we finetune models with a curated dataset which originates from CMU's extensive resources and annotated with the teacher model. Our experiments demonstrate the system's effectiveness in generating more accurate answers to domain-specific and time-sensitive inquiries. The results also revealed the limitations of fine-tuning LLMs with small-scale and skewed datasets. This research highlights the potential of RAG systems in augmenting LLMs with external datasets for improved performance in knowledge-intensive tasks. Our code and models are available on Github.
- Gqa: Training generalized multi-query transformer models from multi-head checkpoints.
- Gpt4all: Training an assistant-style chatbot with large scale data distillation from gpt-3.5-turbo. https://github.com/nomic-ai/gpt4all.
- On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pages 610–623.
- Language models are few-shot learners.
- Discovering latent knowledge in language models without supervision. arXiv preprint arXiv:2212.03827.
- A survey on in-context learning. arXiv preprint arXiv:2301.00234.
- Retrieval-augmented generation for large language models: A survey.
- A survey on hallucination in large language models: principles, taxonomy, challenges, and open questions.
- Open source strikes bread - new fluffy embeddings model.
- Starcoder: may the source be with you!
- Mteb: Massive text embedding benchmark.
- Llama 2: Open foundation and fine-tuned chat models.
- Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846.
- Wizardlm: Empowering large language models to follow complex instructions.
- Decoding in-context learning: Neuroscience-inspired analysis of representations in large language models.
- Halueval-wild: Evaluating hallucinations of language models in the wild. arXiv preprint arXiv:2403.04307.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.