Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
11 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Controlled Learning for Inventory Control (2011.15122v7)

Published 30 Nov 2020 in cs.LG

Abstract: The application of Deep Reinforcement Learning (DRL) to inventory management is an emerging field. However, traditional DRL algorithms, originally developed for diverse domains such as game-playing and robotics, may not be well-suited for the specific challenges posed by inventory management. Consequently, these algorithms often fail to outperform established heuristics; for instance, no existing DRL approach consistently surpasses the capped base-stock policy in lost sales inventory control. This highlights a critical gap in the practical application of DRL to inventory management: the highly stochastic nature of inventory problems requires tailored solutions. In response, we propose Deep Controlled Learning (DCL), a new DRL algorithm designed for highly stochastic problems. DCL is based on approximate policy iteration and incorporates an efficient simulation mechanism, combining Sequential Halving with Common Random Numbers. Our numerical studies demonstrate that DCL consistently outperforms state-of-the-art heuristics and DRL algorithms across various inventory settings, including lost sales, perishable inventory systems, and inventory systems with random lead times. DCL achieves lower average costs in all test cases while maintaining an optimality gap of no more than 0.2\%. Remarkably, this performance is achieved using the same hyperparameter set across all experiments, underscoring the robustness and generalizability of our approach. These findings contribute to the ongoing exploration of tailored DRL algorithms for inventory management, providing a foundation for further research and practical application in this area.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (65)
  1. Fitting discrete distributions on the first two moments. Probability in the Engineering and Informational Sciences, 9(4):623–632.
  2. Closed-form approximations for optimal (r, q) and (s, t) policies in a parallel processing environment. Operations Research, 65(5):1414–1428.
  3. Lost-sales inventory theory: A review. European Journal of Operational Research, 215(1):1–13.
  4. Deep reinforcement learning for inventory control: A roadmap. European Journal of Operational Research, 298(2):401–412.
  5. Improved base-stock approximations for independent stochastic lead times with order crossover. Manufacturing & Service Operations Management, 7(4):319–329.
  6. A heuristic to manage perishable inventory with batch ordering, positive lead-times, and time-varying demand. Computers & Operations Research, 36(11):3013–3018.
  7. Asymptotic optimality of base-stock policies for perishable inventory systems. Management Science, 69(2):846–864.
  8. Regret analysis of stochastic and nonstochastic multi-armed bandit problems. Foundations and Trends® in Machine Learning, 5(1):1–122.
  9. Interesting, important, and impactful operations management. Manufacturing & Service Operations Management, 22(1):214–222.
  10. Approximation algorithms for capacitated perishable inventory systems with positive lead times. Management Science, 64(11):5038–5061.
  11. Fixed-dimensional stochastic dynamic programs: An approximation scheme and an inventory application. Operations Research, 62(1):81–103.
  12. Optimal policies for a multi-echelon inventory problem. Management Science, 6(4):475–490.
  13. Lifo inventory systems. Management Science, 24(11):1150–1162.
  14. Policy improvement by planning with gumbel. In International Conference on Learning Representations.
  15. Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management. European Journal of Operational Research, 301(2):535–545.
  16. Discovering and removing exogenous state variables and rewards for reinforcement learning. In International Conference on Machine Learning, pages 1262–1270. PMLR.
  17. Inventory management for stochastic lead times with order crossovers. European Journal of Operational Research, 248(2):473–486.
  18. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6(18):503–556.
  19. Sequential halving using scores. In Browne, C., Kishimoto, A., and Schaeffer, J., editors, Advances in Computer Games, pages 41–52, Cham. Springer International Publishing.
  20. Approximations of dynamic, multilocation production and inventory problems. Management Science, 30(1):69–84.
  21. Multi-echelon inventory optimization using deep reinforcement learning. Available at SSRN: https://ssrn.com/abstract=4227665 or http://dx.doi.org/10.2139/ssrn.4227665.
  22. Can deep reinforcement learning improve inventory management? performance on lost sales, dual-sourcing, and multi-echelon problems. Manufacturing & Service Operations Management, 24(3):1349–1368.
  23. Asymptotic optimality of constant-order policies for lost sales inventory models with large lead times. Mathematics of Operations Research, 41(3):898–913.
  24. Deep Learning. MIT Press. http://www.deeplearningbook.org.
  25. Improved ordering of perishables: The value of stock-age information. International Journal of Production Economics, 209:316–324. The Proceedings of the 19th International Symposium on Inventories.
  26. Review on ranking and selection: A new perspective. Frontiers of Engineering Management, 8:321–343.
  27. Asymptotic optimality of order-up-to policies in lost sales inventory systems. Management Science, 55(3):404–420.
  28. Pure and restricted base-stock policies for the lost-sales inventory system with periodic review and constant lead times. 15th International Symposium on Inventories ; Conference date: 22-08-2008 Through 26-08-2008.
  29. Managing perishable and aging inventories: Review and future research directions. In Kempf, K., Keskinocak, P., and Uzsoy, R., editors, Planning Production and Inventories in the Extended Enterprise, volume 151 of International Series in Operations Research & Management Science, pages 393–436. Springer, New York.
  30. Almost optimal exploration in multi-armed bandits. In Dasgupta, S. and McAllester, D., editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1238–1246, Atlanta, Georgia, USA. PMLR.
  31. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
  32. Reinforcement learning as classification: Leveraging modern classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 424–431.
  33. Simulation Modeling and Analysis. McGraw-Hill, Boston, MA.
  34. Analysis of classification-based policy iteration algorithms. Journal of Machine Learning Research, 17(19):1–30.
  35. Variance reduction for reinforcement learning in input-driven environments. In International Conference on Learning Representations.
  36. Periodic review inventory-control for perishable products under service-level constraints. OR Spectrum, 32:979–996.
  37. Asynchronous methods for deep reinforcement learning. In International conference on machine learning, pages 1928–1937.
  38. Human-level control through deep reinforcement learning. nature, 518(7540):529–533.
  39. Morton, K. (1969). Bounds on the solution of the lagged optimal inventory equation with no demand backlogging and proportional costs. SIAM Review, 11(4):572–596.
  40. Morton, K. (1971). The near-myopic nature of the lagged-proportional-cost inventory problem with lost sales. Operations Research, 19(7):1708–1716.
  41. Inventory management with stochastic lead times. Mathematics of Operations Research, 40(2):302–327.
  42. Nahmias, S. (1975a). A comparison of alternative approximations for ordering perishable inventory. Information Systems and Operational Research, 13(2):175–184.
  43. Nahmias, S. (1975b). Optimal ordering policies for perishable inventory—ii. Operations Research, 23(4):735–749.
  44. Nahmias, S. (2011). Perishable Inventory Systems. International Series in Operations Research & Management Science. Springer, New York.
  45. A deep q-network for the beer game: Deep reinforcement learning for inventory optimization. Manufacturing & Service Operations Management, 24(1):285–304.
  46. Pytorch: An imperative style, high-performance deep learning library.
  47. Powell, W. (2020). Reinforcement Learning and Stochastic Optimization: A unified framework for sequential decisions. Wiley-Interscience.
  48. Powell, W. B. (2011). Approximate Dynamic Programming: Solving the Curses of Dimensionality. Wiley, Hoboken, NJ.
  49. Powell, W. B. (2019). A unified framework for stochastic optimization. European Journal of Operational Research, 275(3):795–821.
  50. Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  51. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
  52. A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science, 362(6419):1140–1144.
  53. Inventory Management and Production Planning and Scheduling. John Wiley & Sons.
  54. Exploiting random lead times for significant inventory cost savings. Operations Research, 70(4):2496–2516.
  55. Quadratic approximation of cost functions in lost sales and perishable inventory control problems. Working paper, Fuqua School of Business, Duke University, Durham, NC.
  56. Reinforcement Learning: An Introduction. MIT Press, 2nd edition.
  57. On-line policy improvement using monte-carlo search. In Mozer, M., Jordan, M., and Petsche, T., editors, Advances in Neural Information Processing Systems, volume 9. MIT Press.
  58. Reinforcement learning with exogenous states and rewards. arXiv preprint arXiv:2303.12957.
  59. Using the proximal policy optimisation algorithm for solving the stochastic capacitated lot sizing problem. International Journal of Production Research, 61(6):1955–1978.
  60. Spare parts inventory control under system availability constraints, volume 227. Springer.
  61. Use of Proximal Policy Optimization for the Joint Replenishment Problem. Computers in Industry, 119:103239.
  62. Xin, L. (2021). Technical note—understanding the performance of capped base-stock policies in lost-sales inventory models. Operations Research, 69(1):61–70.
  63. Optimality gap of constant-order policies decays exponentially in the lead time for lost sales models. Operations Research, 64(6):1556–1565.
  64. Zipkin, P. (2008). Old and new methods for lost-sales inventory systems. Operations Research, 56(5):1256–1263.
  65. Zipkin, P. H. (2000). Foundations of Inventory Management. McGraw-Hill, Boston.
Citations (5)

Summary

We haven't generated a summary for this paper yet.