Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning (1911.04936v1)

Published 12 Nov 2019 in cs.LG and stat.ML

Abstract: In this work, we introduce Graph Pointer Networks (GPNs) trained using reinforcement learning (RL) for tackling the traveling salesman problem (TSP). GPNs build upon Pointer Networks by introducing a graph embedding layer on the input, which captures relationships between nodes. Furthermore, to approximate solutions to constrained combinatorial optimization problems such as the TSP with time windows, we train hierarchical GPNs (HGPNs) using RL, which learns a hierarchical policy to find an optimal city permutation under constraints. Each layer of the hierarchy is designed with a separate reward function, resulting in stable training. Our results demonstrate that GPNs trained on small-scale TSP50/100 problems generalize well to larger-scale TSP500/1000 problems, with shorter tour lengths and faster computational times. We verify that for constrained TSP problems such as the TSP with time windows, the feasible solutions found via hierarchical RL training outperform previous baselines. In the spirit of reproducible research we make our data, models, and code publicly available.

Citations (168)

View on Semantic Scholar

Summary

The paper presents Graph Pointer Networks enriched with graph embeddings that significantly enhance generalization and computational efficiency on TSP benchmarks.
It employs a two-layer hierarchical reinforcement learning framework to effectively manage complex constraints in TSP with time windows.
Empirical results demonstrate that GPNs scale from TSP50 to TSP1000 and reduce computational overhead when combined with local search methods.

Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning

The paper, "Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning," authored by Qiang Ma et al., presents an innovative approach to tackling combinatorial optimization problems, specifically focusing on the traveling salesman problem (TSP) and its constrained variant, TSP with time windows (TSPTW). Through the development of Graph Pointer Networks (GPNs) enhanced by graph embeddings and a hierarchical reinforcement learning (HRL) framework, the authors aim to surpass existing methodologies in terms of generalization, computational efficiency, and solving constrained combinatorial problems.

The proposed GPNs are an extension of Pointer Networks, equipped with graph embedding layers that capture node relationships more effectively. This innovation is crucial for processing non-Euclidean data typical in routing problems. The GPN architecture employs vector contexts instead of point contexts, providing transferable representations that generalize well from models trained on small-scale instances to larger-scale TSP problems.

Empirical evidence demonstrates GPNs trained on TSP instances with 50 cities (TSP50) achieve remarkable generalization when applied to instances containing up to 1000 cities (TSP1000). The paper compares the tour length and computational times of GPNs against established heuristics such as the Lin-Kernighan heuristic (LKH), nearest neighbor, and 2-opt, as well as contemporary machine learning approaches like Pointer Networks and Attention Models. The results reveal that, although GPNs do not outperform state-of-the-art solvers like LKH, they serve as efficient initialization methods that significantly reduce computational overhead when combined with local search algorithms.

Furthermore, the authors introduce a two-layer hierarchical GPN to address the TSPTW, effectively dealing with constraints more robustly than single-layer models or penalty-based methods. The hierarchical RL approach divides complex tasks into subtasks learned across layers, improving stability and convergence. On the TSPTW, this hierarchical architecture achieves higher percentages of feasible solutions compared to other baselines like Google OR-Tools and Ant Colony Optimization, underscoring its efficacy in solving constrained problems.

The implications of this research are substantial for fields requiring optimized routing solutions. The ability of GPNs and HGPNs to generalize across problem sizes and incorporate constraints suggests potential applications in logistics, network management, and operations research. Moreover, the hierarchical RL approach offers a promising methodology for complex combinatorial tasks where constraint satisfaction is critical.

Future directions may include further exploration of hierarchical architectures for other combinatorial optimization challenges and integrating advanced neural architectures like transformers. The research underscores the importance of graph-based methods in expanding the capabilities of machine learning in operational settings, laying groundwork for more sophisticated AI models.

PDF Markdown

Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning (1911.04936v1)

Summary

Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning

Related Papers