Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
104 tokens/sec
GPT-4o
12 tokens/sec
Gemini 2.5 Pro Pro
40 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources (2305.13269v4)

Published 22 May 2023 in cs.CL

Abstract: We present chain-of-knowledge (CoK), a novel framework that augments LLMs by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-intensive question, CoK first prepares several preliminary rationales and answers while identifying the relevant knowledge domains. If there is no majority consensus among the answers from samples, CoK corrects the rationales step by step by adapting knowledge from the identified domains. These corrected rationales can plausibly serve as a better foundation for the final answer consolidation. Unlike prior studies that primarily use unstructured data, CoK also leverages structured knowledge sources such as Wikidata and tables that provide more reliable factual information. To access both unstructured and structured knowledge sources in the dynamic knowledge adapting stage, we propose an adaptive query generator that allows the generation of queries for various types of query languages, including SPARQL, SQL, and natural sentences. Moreover, to minimize error propagation between rationales, CoK corrects the rationales progressively using preceding corrected rationales to generate and correct subsequent rationales. Extensive experiments show that CoK consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (39)
  1. Autoregressive entity retrieval. In Proceedings of ICLR, 2021.
  2. KQA pro: A dataset with explicit compositional programs for complex question answering over knowledge base. In Proceedings of ACL, 2022.
  3. Reading Wikipedia to answer open-domain questions. In Proceedings of ACL, 2017.
  4. Is gpt-4 a good data analyst? arXiv preprint arXiv:2305.15038, 2023.
  5. Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
  6. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
  7. Is gpt-3 a good data annotator? In Proceedings of ACL, 2023.
  8. Can machine translation systems be evaluated by the crowd alone. Natural Language Engineering, 2017.
  9. Retrieval augmented language model pre-training. In Proceedings of ICML, 2020.
  10. Medalpaca – an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
  11. Measuring massive multitask language understanding. In Proceedings of ICLR, 2021.
  12. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  13. Survey of hallucination in natural language generation. ACM Computing Surveys, 2023.
  14. Findings of the 2022 conference on machine translation (WMT22). In Proceedings of WMT, 2022.
  15. The measurement of observer agreement for categorical data. biometrics, 1977.
  16. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of NIPS, 2020.
  17. Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 2023.
  18. Learn to explain: Multimodal reasoning via thought chains for science question answering. In Proceedings of NIPS, 2022.
  19. Augmented large language models with parametric knowledge guiding. arXiv preprint arXiv:2305.04757, 2023.
  20. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023.
  21. FeTaQA: Free-form table question answering. Transactions of the Association for Computational Linguistics, 2022.
  22. OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
  23. Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
  24. Medmcqa : A large-scale multi-subject multi-choice dataset for medical domain question answering. arXiv preprint arXiv:2203.14371, 2022.
  25. Fact-checking complex claims with program-guided reasoning. In Proceedings of ACL, 2023.
  26. KILT: a benchmark for knowledge intensive language tasks. In Proceedings of NAACL, 2021.
  27. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  28. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
  29. Replug: Retrieval-augmented black-box language models, 2023.
  30. FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of NAACL, 2018.
  31. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
  32. Lc-quad: A corpus for complex question answering over knowledge graphs. In Proceedings of ISWC, 2017.
  33. Self-consistency improves chain of thought reasoning in language models. In Proceedings of ICLR, 2023.
  34. Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of NIPS, 2022.
  35. UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. In Proceedings of EMNLP, 2022.
  36. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of EMNLP, 2018.
  37. React: Synergizing reasoning and acting in language models. In Proceedings of ICLR, 2023.
  38. Retrieving multimodal information for augmented generation: A survey. arXiv preprint arXiv:2303.10868, 2023a.
  39. Verify-and-edit: A knowledge-enhanced chain-of-thought framework. In Proceedings of ACL, 2023b.
Citations (69)

Summary

  • The paper introduces the CoK framework that dynamically adapts queries to integrate diverse knowledge sources, boosting LLM performance by an average of 4.3% over CoT baselines.
  • It employs an adaptive query generator that produces SPARQL and SQL queries to retrieve and verify structured data from sources like Wikidata.
  • The three-stage process—reasoning preparation, dynamic knowledge adapting, and answer consolidation—significantly mitigates hallucinations and improves factual accuracy across multiple domains.

Chain-of-Knowledge: Grounding LLMs via Dynamic Knowledge Adapting over Heterogeneous Sources

This paper introduces a framework named chain-of-knowledge (CoK) that enhances LLMs by dynamically integrating grounding information from diverse sources to improve factuality and reduce hallucinations. Unlike existing approaches that primarily rely on unstructured data, CoK leverages both structured knowledge sources, such as Wikidata and tabular data, and unstructured data, utilizing an adaptive query generator capable of generating and executing various queries, including SPARQL and SQL.

The CoK framework consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. The initial stage involves preparation where several preliminary rationales and potential answers are generated, and relevant knowledge domains are identified. If there is no consensus among the answers, the dynamic knowledge adapting stage progressively corrects rationales using information from identified knowledge domains. Finally, the corrected rationales form a solid foundation for the answer consolidation stage.

A key innovation in CoK is the adaptive query generator, a flexible component that can be either a fine-tuned model (e.g., LLaMA-2 with LoRA) or an off-the-shelf LLM (e.g., ChatGPT). This generator adapts to various knowledge sources by producing queries suited to their formats, facilitating the retrieval of more reliable and domain-specific information.

Empirical results demonstrate that CoK consistently improves LLM performance across tasks requiring intensive factual knowledge, with an average performance enhancement of 4.3% over chain-of-thought (CoT) baselines. Notably, the use of CoK on knowledge-intensive tasks in factual, medical, physics, and biology domains showcases the efficacy of integrating diverse data sources to augment LLM capabilities.

CoK effectively addresses the inherent challenges of hallucination and factual inaccuracies in LLMs. By leveraging structured knowledge sources and dynamically adapting retrieval strategies, CoK facilitates the generation of more accurate rationales and predictions. Its modular design allows it to be adapted for use with various LLMs and knowledge sources, offering potential for significant advancements in AI applications requiring robust information verification and factual precision.

The implications of this research are profound for the future of AI. As LLM capabilities continue to advance, frameworks like CoK may become essential in domains requiring high factual reliability, such as legal, scientific, and educational applications, where accuracy and source validation are paramount. The approach taken by this paper sets a precedent for integrating diverse knowledge formats and sources into model training and evaluation, emphasizing a path forward towards reducing AI-generated misinformation. Additionally, the advancements in adaptive query generation highlight directions for further research in natural language processing tasks, particularly in enhancing the interaction between LLMs and structured knowledge bases.

Youtube Logo Streamline Icon: https://streamlinehq.com