Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources (2305.13269v4)
Abstract: We present chain-of-knowledge (CoK), a novel framework that augments LLMs by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-intensive question, CoK first prepares several preliminary rationales and answers while identifying the relevant knowledge domains. If there is no majority consensus among the answers from samples, CoK corrects the rationales step by step by adapting knowledge from the identified domains. These corrected rationales can plausibly serve as a better foundation for the final answer consolidation. Unlike prior studies that primarily use unstructured data, CoK also leverages structured knowledge sources such as Wikidata and tables that provide more reliable factual information. To access both unstructured and structured knowledge sources in the dynamic knowledge adapting stage, we propose an adaptive query generator that allows the generation of queries for various types of query languages, including SPARQL, SQL, and natural sentences. Moreover, to minimize error propagation between rationales, CoK corrects the rationales progressively using preceding corrected rationales to generate and correct subsequent rationales. Extensive experiments show that CoK consistently improves the performance of LLMs on knowledge-intensive tasks across different domains.
- Autoregressive entity retrieval. In Proceedings of ICLR, 2021.
- KQA pro: A dataset with explicit compositional programs for complex question answering over knowledge base. In Proceedings of ACL, 2022.
- Reading Wikipedia to answer open-domain questions. In Proceedings of ACL, 2017.
- Is gpt-4 a good data analyst? arXiv preprint arXiv:2305.15038, 2023.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311, 2022.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416, 2022.
- Is gpt-3 a good data annotator? In Proceedings of ACL, 2023.
- Can machine translation systems be evaluated by the crowd alone. Natural Language Engineering, 2017.
- Retrieval augmented language model pre-training. In Proceedings of ICML, 2020.
- Medalpaca – an open-source collection of medical conversational ai models and training data. arXiv preprint arXiv:2304.08247, 2023.
- Measuring massive multitask language understanding. In Proceedings of ICLR, 2021.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- Survey of hallucination in natural language generation. ACM Computing Surveys, 2023.
- Findings of the 2022 conference on machine translation (WMT22). In Proceedings of WMT, 2022.
- The measurement of observer agreement for categorical data. biometrics, 1977.
- Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of NIPS, 2020.
- Chain of hindsight aligns language models with feedback. arXiv preprint arXiv:2302.02676, 2023.
- Learn to explain: Multimodal reasoning via thought chains for science question answering. In Proceedings of NIPS, 2022.
- Augmented large language models with parametric knowledge guiding. arXiv preprint arXiv:2305.04757, 2023.
- Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023.
- FeTaQA: Free-form table question answering. Transactions of the Association for Computational Linguistics, 2022.
- OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Training language models to follow instructions with human feedback. arXiv preprint arXiv:2203.02155, 2022.
- Medmcqa : A large-scale multi-subject multi-choice dataset for medical domain question answering. arXiv preprint arXiv:2203.14371, 2022.
- Fact-checking complex claims with program-guided reasoning. In Proceedings of ACL, 2023.
- KILT: a benchmark for knowledge intensive language tasks. In Proceedings of NAACL, 2021.
- Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
- Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
- Replug: Retrieval-augmented black-box language models, 2023.
- FEVER: a large-scale dataset for fact extraction and VERification. In Proceedings of NAACL, 2018.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Lc-quad: A corpus for complex question answering over knowledge graphs. In Proceedings of ISWC, 2017.
- Self-consistency improves chain of thought reasoning in language models. In Proceedings of ICLR, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. In Proceedings of NIPS, 2022.
- UnifiedSKG: Unifying and multi-tasking structured knowledge grounding with text-to-text language models. In Proceedings of EMNLP, 2022.
- HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of EMNLP, 2018.
- React: Synergizing reasoning and acting in language models. In Proceedings of ICLR, 2023.
- Retrieving multimodal information for augmented generation: A survey. arXiv preprint arXiv:2303.10868, 2023a.
- Verify-and-edit: A knowledge-enhanced chain-of-thought framework. In Proceedings of ACL, 2023b.