Create a Video View Paper

Learning from AVA: When AI Says 'I Don't Know' and Why That Matters

This lightning talk explores AVA, a specialized generative AI system designed for policy and development research that prioritizes verifiable citations and reasoned refusal over confident but ungrounded answers. Through a five-month field study with over 2,200 professionals across 116 countries, the research demonstrates how epistemic humility—the ability to abstain when evidence is insufficient—can recalibrate trust between professionals and AI systems, offering concrete lessons for building reliable AI in high-stakes domains.

Script

When a generative AI system says it doesn't know the answer, is that a bug or a feature? For over 2,200 policy professionals working across 116 countries, that moment of refusal became the most trusted signal in their toolkit.

The authors built AVA as a fundamentally different kind of system: not an open-web answer engine, but a curated retrieval pipeline over 4,000 World Bank reports. Every claim must trace to a specific document span, and when evidence is insufficient, the system declines to answer with an explanation.

Under the hood, AVA orchestrates a multi-agent pipeline: queries are decomposed into sub-questions, retrieval combines lexical and semantic search, and a hierarchical tree-walking process maps document structure. Every sentence in the response is verified against retrieved passages, and abstention triggers when coverage or agreement falls below threshold.

The results are striking. Early in deployment, abstention rates were high because corpus coverage was limited. After expansion, abstention dropped to a stable 11.9 percent baseline, and users interpreted these refusals not as failures, but as signals of honesty. When benchmarked against Perplexity AI on the same corpus, AVA declined unsupported queries while Perplexity fabricated plausible answers with invented citations.

Trust emerges from the entire pipeline, not a single feature. Users reported saving 2.4 to 3.9 hours weekly, but they valued click-through citations above all else: selective, claim-level inspection became a shortcut for trust calibration, especially for counterintuitive results. Professional workflows now orchestrate specialized, humble agents like AVA alongside generalist large language models, with intelligent handoffs preserving accountability.

The ability to say 'I don't know' is not a limitation; it's a primary virtue in high-stakes knowledge work. As AI systems proliferate across policy, health, and legal domains, the lessons from AVA point toward a future of ecosystem-aware, epistemically humble agents that earn trust through verifiable refusal. To dive deeper into this work and create your own video summaries of cutting-edge research, visit EmergentMind.com.