LeanReasoner: Boosting Complex Logical Reasoning with Lean (2403.13312v1)
Abstract: LLMs often struggle with complex logical reasoning due to logical inconsistencies and the inherent difficulty of such reasoning. We use Lean, a theorem proving framework, to address these challenges. By formalizing logical reasoning problems into theorems within Lean, we can solve them by proving or disproving the corresponding theorems. This method reduces the risk of logical inconsistencies with the help of Lean's symbolic solver. It also enhances our ability to treat complex reasoning tasks by using Lean's extensive library of theorem proofs. Our method achieves state-of-the-art performance on the FOLIO dataset and achieves performance near this level on ProofWriter. Notably, these results were accomplished by fine-tuning on fewer than 100 in-domain samples for each dataset.
- Fabian Gloeckle Baptiste Rozière, Jonas Gehring and et.al. 2023. Code llama: Open foundation models for code. CoRR, abs/2308.12950.
- Jianshu Chen. 2023. Learning language representations with logical inductive bias. In ICLR. OpenReview.net.
- Theoremqa: A theorem-driven question answering dataset.
- RECKONING: reasoning through dynamic knowledge encoding. CoRR, abs/2305.06349.
- Training verifiers to solve math word problems.
- Antonia Creswell and Murray Shanahan. 2022. Faithful reasoning using large language models. CoRR, abs/2208.14271.
- Language models show human-like content effects on reasoning. CoRR.
- Leonardo Mendonça de Moura and Nikolaj S. Bjørner. 2008. Z3: an efficient SMT solver. In TACAS.
- The lean theorem prover (system description). In CADE-2.
- Hao Fu, Yao; Peng and Tushar Khot. 2022. How does gpt obtain its ability? tracing emergent abilities of language models to their sources. Yao Fu’s Notion.
- Complexity-based prompting for multi-step reasoning. In ICLR. OpenReview.net.
- Does entity abstraction help generative transformers reason? Trans. Mach. Learn. Res., 2022.
- Proof artifact co-training for theorem proving with language models. In ICLR.
- FOLIO: natural language reasoning with first-order logic. CoRR.
- Solving math word problems by combining language models with symbolic solvers.
- Measuring massive multitask language understanding.
- Measuring Mathematical Problem Solving With the MATH Dataset. In NeurIPS, Menlo Park, Calif. AAAI Press.
- Dense passage retrieval for open-domain question answering. In EMNLP.
- Llms as factual reasoners: Insights from existing benchmarks and beyond.
- Hypertree proof search for neural theorem proving. In NeurIPS.
- Solving quantitative reasoning problems with language models. In NeurIPS.
- LINC: A neurosymbolic approach for logical reasoning by combining language models with first-order logic provers. In EMNLP.
- OpenAI. 2023. GPT-4 technical report. CoRR.
- Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning.
- Certified reasoning with language models. CoRR.
- Formal mathematics statement curriculum learning. In ICLR.
- Stanislas Polu and Ilya Sutskever. 2020. Generative language modeling for automated theorem proving. CoRR, abs/2009.03393.
- Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. In ICLR. OpenReview.net.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models.
- Byt5: Towards a token-free future with pre-trained byte-to-byte models. Trans. Assoc. Comput. Linguistics, 10.
- Generating natural language proofs with verifier-guided search. In EMNLP.
- Leandojo: Theorem proving with retrieval-augmented language models.
- Satisfiability-aided language models using declarative prompting.