$\texttt{LM}^\texttt{2}$: A Simple Society of Language Models Solves Complex Reasoning (2404.02255v1)
Abstract: Despite demonstrating emergent reasoning abilities, LLMs often lose track of complex, multi-step reasoning. Existing studies show that providing guidance via decomposing the original question into multiple subproblems elicits more robustness in LLM reasoning -- a decomposer generates the subproblems, and a solver solves each of these subproblems. However, these techniques fail to accommodate coordination between the decomposer and the solver modules (either in a single model or different specialized ones) -- the decomposer does not keep track of the ability of the solver to follow the decomposed reasoning. In this paper, we propose LM2 to address these challenges. LM2 modularizes the decomposition, solution, and verification into three different LLMs. The decomposer module identifies the key concepts necessary to solve the problem and generates step-by-step subquestions according to the reasoning requirement. The solver model generates the solution to the subproblems that are then checked by the verifier module; depending upon the feedback from the verifier, the reasoning context is constructed using the subproblems and the solutions. These models are trained to coordinate using policy learning. Exhaustive experimentation suggests the superiority of LM2 over existing methods on in- and out-domain reasoning problems, outperforming the best baselines by $8.1\%$ on MATH, $7.71\%$ on JEEBench, and $9.7\%$ on MedQA problems (code available at https://github.com/LCS2-IIITD/Language_Model_Multiplex).
- Have llms advanced enough? a challenging problem solving benchmark for large language models.
- Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. arXiv preprint arXiv:2211.12588.
- Training verifiers to solve math word problems.
- Frugal lms trained to invoke symbolic solvers achieve parameter-efficient arithmetic reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17951–17959.
- Measuring mathematical problem solving with the math dataset. NeurIPS.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- What disease does this patient have? a large-scale open domain question answering dataset from medical exams.
- Small language models fine-tuned to coordinate larger language models improve complex reasoning. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3675–3691, Singapore. Association for Computational Linguistics.
- Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp. arXiv preprint arXiv:2212.14024.
- Decomposed prompting: A modular approach for solving complex tasks. In The Eleventh International Conference on Learning Representations.
- Making large language models better reasoners with step-aware verifier.
- Show your work: Scratchpads for intermediate computation with language models. arXiv preprint arXiv:2112.00114.
- OpenAI. 2023. Gpt-4 technical report.
- Proximal policy optimization algorithms. CoRR, abs/1707.06347.
- Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300.
- Towards verifiable text generation with evolving memory and self-reflection. arXiv preprint arXiv:2312.09075.
- Denis Tarasov and Kumar Shridhar. 2024. Distilling llms’ decomposition abilities into compact language models.
- Openmathinstruct-1: A 1.8 million math instruction tuning dataset. arXiv preprint arXiv:2402.10176.
- Llama: Open and efficient foundation language models.
- Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
- Large language models are better reasoners with self-verification.
- Divide-or-conquer? which part should you distill your llm?
- Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36.
- Progressive-hint prompting improves reasoning in large language models.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations.