Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 162 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 31 tok/s Pro
GPT-5 High 28 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 202 tok/s Pro
GPT OSS 120B 425 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots (2403.01193v3)

Published 2 Mar 2024 in cs.CL and cs.AI

Abstract: LLMs like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts. We empirically evaluate RAG against standard LLMs using prompts designed to induce hallucinations. Our results show that RAG increases accuracy in some cases, but can still be misled when prompts directly contradict the model's pre-trained understanding. These findings highlight the complex nature of hallucinations and the need for more robust solutions to ensure LLM reliability in real-world applications. We offer practical recommendations for RAG deployment and discuss implications for the development of more trustworthy LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (15)
  1. Artificial hallucinations in chatgpt: implications in scientific writing. Cureus, 15(2), 2023.
  2. Climbing towards nlu: On meaning, form, and understanding in the age of data. In Proceedings of the 58th annual meeting of the association for computational linguistics, pages 5185–5198, 2020.
  3. Ali Borji. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494, 2023.
  4. Benchmarking large language models in retrieval-augmented generation. arXiv preprint arXiv:2309.01431, 2023.
  5. Trapping llm hallucinations using tagged context prompts. arXiv preprint arXiv:2306.06085, 2023.
  6. The impact of chatgpt on human data collection: A case study involving typicality norming data. Behavior Research Methods, pages 1–8, 2023.
  7. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
  8. Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models. arXiv preprint arXiv:2303.08896, 2023.
  9. Sources of hallucination by large language models on inference tasks. arXiv preprint arXiv:2305.14552, 2023.
  10. Large language models are strong zero-shot retriever, 2023.
  11. Herbert A Simon. The sciences of the artificial. MIT press, 1996.
  12. Curriculum vitae: challenges and potential solutions. KOME: An International Journal of Pure Communication Inquiry, 8(2):109–127, 2020.
  13. Prevalence and prevention of large language model use in crowd work, 2023.
  14. On the security of containers: Threat modeling, attack analysis, and mitigation strategies. Computers & Security, 128:103140, 2023.
  15. Exploring AI ethics of ChatGPT: A diagnostic analysis. arXiv preprint arXiv:2301.12867, 2023.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.