Self-Guiding Exploration for Combinatorial Problems (2405.17950v1)
Abstract: LLMs have become pivotal in addressing reasoning tasks across diverse domains, including arithmetic, commonsense, and symbolic reasoning. They utilize prompting techniques such as Exploration-of-Thought, Decomposition, and Refinement to effectively navigate and solve intricate tasks. Despite these advancements, the application of LLMs to Combinatorial Problems (CPs), known for their NP-hardness and critical roles in logistics and resource management remains underexplored. To address this gap, we introduce a novel prompting strategy: Self-Guiding Exploration (SGE), designed to enhance the performance of solving CPs. SGE operates autonomously, generating multiple thought trajectories for each CP task. It then breaks these trajectories down into actionable subtasks, executes them sequentially, and refines the results to ensure optimal outcomes. We present our research as the first to apply LLMs to a broad range of CPs and demonstrate that SGE outperforms existing prompting strategies by over 27.84% in CP optimization performance. Additionally, SGE achieves a 2.46% higher accuracy over the best existing results in other reasoning tasks (arithmetic, commonsense, and symbolic).
- Proposed heuristic method for solving assignment problems. American Journal of Operations Research, 06:436–441, 01 2016. doi: 10.4236/ajor.2016.66040.
- Dynamic job-shop scheduling using reinforcement learning agents. Robotics and Autonomous Systems, 33(2-3):169–178, 2000.
- Neural combinatorial optimization with reinforcement learning. arXiv preprint arXiv:1611.09940, 2016.
- Language models are few-shot learners. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors, Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, 2020. URL https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html.
- Palm: Scaling language modeling with pathways. Journal of Machine Learning Research, 24(240):1–113, 2023. URL http://jmlr.org/papers/v24/22-1144.html.
- Learning heuristics for the tsp by policy gradient. In International conference on the integration of constraint programming, artificial intelligence, and operations research, pages 170–181. Springer, 2018.
- Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intelligent Computing, 24(4):14–18, 2008.
- Vehicle routing problem with time windows having stochastic customers demands and stochastic service times: Modelling and solution. J. Comput. Sci., 34:1–10, 2019.
- Google. Or-tools, 2023. URL https://developers.google.com/optimization.
- On the study of curriculum learning for inferring dispatching policies on the job shop scheduling. In Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China, pages 5350–5358. ijcai.org, 2023a. doi: 10.24963/IJCAI.2023/594. URL https://doi.org/10.24963/ijcai.2023/594.
- Reinforcement learning approach to stochastic vehicle routing problem with correlated demands. IEEE Access, 11:87958–87969, 2023b. doi: 10.1109/ACCESS.2023.3306076. URL https://doi.org/10.1109/ACCESS.2023.3306076.
- Decomposed prompting: A modular approach for solving complex tasks. arXiv preprint arXiv:2210.02406, 2022.
- Large language models are zero-shot reasoners. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/8bb0d291acd4acf06ef112099c16f326-Abstract-Conference.html.
- Attention, learn to solve routing problems! arXiv preprint arXiv:1803.08475, 2018.
- An improved tabu search algorithm for the stochastic vehicle routing problem with soft time windows. IEEE Access, 8:158115–158124, 2020.
- Evolution of heuristics: Towards efficient automatic algorithm design using large language mode, 2024.
- Self-refine: Iterative refinement with self-feedback. ArXiv preprint, abs/2303.17651, 2023. URL https://arxiv.org/abs/2303.17651.
- Optimizing production manufacturing using reinforcement learning. In FLAIRS conference, volume 372, page 377, 1998.
- Self-improving factory simulation using continuous-time average-reward reinforcement learning. In Machine Learning Interantional Workshop, pages 202–210, 1997.
- Learning scheduling algorithms for data processing clusters. In Proceedings of the ACM special interest group on data communication, pages 270–288. 2019.
- Exploring combinatorial problem solving with large language models: A case study on the travelling salesman problem using gpt-3.5 turbo, 2024.
- The parallelism tradeoff: Limitations of log-precision transformers. Transactions of the Association for Computational Linguistics, 11:531–545, 2023. doi: 10.1162/tacl_a_00562. URL https://aclanthology.org/2023.tacl-1.31.
- Reinforcement learning for solving the vehicle routing problem. In Conference on Neural Information Processing Systems, NeurIPS 2018, 2018.
- Applying deep learning to the newsvendor problem. IISE Transactions, 52(4):444–463, 2020.
- Training language models to follow instructions with human feedback. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html.
- A heuristic approach based on clarke-wright algorithm for open vehicle routing problem. The Scientific World Journal, 2013, 2013.
- A comparison of priority rules for the job shop scheduling problem under different flow time-and tardiness-related objective functions. International Journal of Production Research, 50(15):4255–4270, 2012.
- Deepweave: Accelerating job completion time with deep reinforcement learning-based coflow scheduling. In Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence, pages 3314–3320, 2021.
- Challenging big-bench tasks and whether chain-of-thought can solve them. In Anna Rogers, Jordan L. Boyd-Graber, and Naoaki Okazaki, editors, Findings of the Association for Computational Linguistics: ACL 2023, Toronto, Canada, July 9-14, 2023, pages 13003–13051. Association for Computational Linguistics, 2023. doi: 10.18653/v1/2023.findings-acl.824. URL https://doi.org/10.18653/v1/2023.findings-acl.824.
- Lamda: Language models for dialog applications. ArXiv preprint, abs/2201.08239, 2022. URL https://arxiv.org/abs/2201.08239.
- Attention is all you need. In NeurIPS, pages 5998–6008, 2017.
- Pointer networks. Advances in neural information processing systems, 28, 2015.
- Dynamic job-shop scheduling in smart manufacturing using deep reinforcement learning. Computer Networks, 190:107969, 2021.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Emergent abilities of large language models. Transactions on Machine Learning Research, 2022a. ISSN 2835-8856. URL https://openreview.net/forum?id=yzkSU5zdwD. Survey Certification.
- Chain-of-thought prompting elicits reasoning in large language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems, volume 35, pages 24824–24837. Curran Associates, Inc., 2022b. URL https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022c.
- Large language models as optimizers, 2024.
- Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023.
- Rlscheduler: an automated hpc batch job scheduler using reinforcement learning. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–15. IEEE, 2020.
- A reinforcement learning approach to job-shop scheduling. In IJCAI, volume 95, pages 1114–1120. Citeseer, 1995.
- Prompting with divide-and-conquer program makes large language models discerning to hallucination and deception, 2024.
- Automatic chain of thought prompting in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=5NTt8GFjUHkr.
- Least-to-most prompting enables complex reasoning in large language models. arXiv preprint arXiv:2205.10625, 2022.
- Least-to-most prompting enables complex reasoning in large language models. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=WZH7099tgfM.