Evaluating Search Engines and Large Language Models for Answering Health Questions (2407.12468v3)

Published 17 Jul 2024 in cs.IR and cs.AI

Abstract: Search engines (SEs) have traditionally been primary tools for information seeking, but the new LLMs are emerging as powerful alternatives, particularly for question-answering tasks. This study compares the performance of four popular SEs, seven LLMs, and retrieval-augmented (RAG) variants in answering 150 health-related questions from the TREC Health Misinformation (HM) Track. Results reveal SEs correctly answer between 50 and 70% of questions, often hindered by many retrieval results not responding to the health question. LLMs deliver higher accuracy, correctly answering about 80% of questions, though their performance is sensitive to input prompts. RAG methods significantly enhance smaller LLMs' effectiveness, improving accuracy by up to 30% by integrating retrieval evidence.