Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 83 tok/s
Gemini 2.5 Pro 42 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 36 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 220 tok/s Pro
GPT OSS 120B 473 tok/s Pro
Claude Sonnet 4 40 tok/s Pro
2000 character limit reached

Distilling Algorithmic Reasoning from LLMs via Explaining Solution Programs (2404.08148v1)

Published 11 Apr 2024 in cs.CL

Abstract: Distilling explicit chain-of-thought reasoning paths has emerged as an effective method for improving the reasoning abilities of LLMs across various tasks. However, when tackling complex tasks that pose significant challenges for state-of-the-art models, this technique often struggles to produce effective chains of thought that lead to correct answers. In this work, we propose a novel approach to distill reasoning abilities from LLMs by leveraging their capacity to explain solutions. We apply our method to solving competitive-level programming challenges. More specifically, we employ an LLM to generate explanations for a set of <problem, solution-program> pairs, then use <problem, explanation> pairs to fine-tune a smaller LLM, which we refer to as the Reasoner, to learn algorithmic reasoning that can generate "how-to-solve" hints for unseen problems. Our experiments demonstrate that learning from explanations enables the Reasoner to more effectively guide program implementation by a Coder, resulting in higher solve rates than strong chain-of-thought baselines on competitive-level programming problems. It also outperforms models that learn directly from <problem, solution-program> pairs. We curated an additional test set in the CodeContests format, which includes 246 more recent problems posted after the models' knowledge cutoff.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (30)
  1. Deepcoder: Learning to write programs. In International Conference on Learning Representations, 2017. URL https://openreview.net/forum?id=ByldLrqlx.
  2. Model compression. In Knowledge Discovery and Data Mining, 2006. URL https://api.semanticscholar.org/CorpusID:11253972.
  3. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374.
  4. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks, 2022.
  5. Teaching large language models to self-debug, 2023.
  6. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024.
  7. Measuring coding challenge competence with apps, 2021.
  8. Distilling the knowledge in a neural network, 2015.
  9. Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes, 2023.
  10. System 1 + system 2 = better world: Neural-symbolic chain of logic reasoning. In Findings of the Association for Computational Linguistics: EMNLP 2022, pp.  601–612, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.findings-emnlp.42.
  11. Competition-level problems are effective llm evaluators, 2023.
  12. Large language models are zero-shot reasoners. In Advances in Neural Information Processing Systems, 2022.
  13. Explaining competitive-level programming solutions using llms, 2023.
  14. Competition-level code generation with AlphaCode. Science, 378(6624):1092–1097, dec 2022. doi: 10.1126/science.abq1158. URL https://doi.org/10.1126%2Fscience.abq1158.
  15. Faithful chain-of-thought reasoning, 2023.
  16. Is self-repair a silver bullet for code generation?, 2023.
  17. OpenAI. ChatGPT: Optimizing Language Models for Dialogue. https://openai.com/blog/chatgpt, 2023a.
  18. OpenAI. Gpt-4 technical report, 2023b.
  19. Neural program search: Solving programming tasks from description and examples. CoRR, abs/1802.04335, 2018. URL http://arxiv.org/abs/1802.04335.
  20. Code generation with alphacodium: From prompt engineering to flow engineering, 2024.
  21. Alpaca: A strong, replicable instruction-following model. https://crfm.stanford.edu/blog.html, 2023. Accessed: date-of-access.
  22. Iteratively prompt pre-trained language models for chain of thought. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp.  2714–2730, Abu Dhabi, United Arab Emirates, December 2022. Association for Computational Linguistics. URL https://aclanthology.org/2022.emnlp-main.174.
  23. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  24. Effective distillation of table-based reasoning ability from llms, 2023.
  25. Learning to mine aligned code and natural language pairs from stack overflow, 2018.
  26. Mammoth: Building math generalist models through hybrid instruction tuning, 2023.
  27. Parsel: A (de-)compositional framework for algorithmic reasoning with language models, 2023.
  28. Algo: Synthesizing algorithmic programs with llm-generated oracle verifiers, 2023.
  29. Least-to-most prompting enables complex reasoning in large language models, 2023.
  30. Pad: Program-aided distillation specializes large models in reasoning, 2023.
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.