Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Deep Offline Reinforcement Learning for Real-world Treatment Optimization Applications (2302.07549v2)

Published 15 Feb 2023 in cs.LG

Abstract: There is increasing interest in data-driven approaches for recommending optimal treatment strategies in many chronic disease management and critical care applications. Reinforcement learning methods are well-suited to this sequential decision-making problem, but must be trained and evaluated exclusively on retrospective medical record datasets as direct online exploration is unsafe and infeasible. Despite this requirement, the vast majority of treatment optimization studies use off-policy RL methods (e.g., Double Deep Q Networks (DDQN) or its variants) that are known to perform poorly in purely offline settings. Recent advances in offline RL, such as Conservative Q-Learning (CQL), offer a suitable alternative. But there remain challenges in adapting these approaches to real-world applications where suboptimal examples dominate the retrospective dataset and strict safety constraints need to be satisfied. In this work, we introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training. We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization to compare performance of the proposed approach against prominent off-policy and offline RL baselines (DDQN and CQL). Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes and in accordance with relevant practice and safety guidelines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
  2. Mastering the game of go without human knowledge. nature, 550(7676):354–359, 2017.
  3. Continuous state-space models for optimal sepsis treatment: a deep reinforcement learning approach. In Finale Doshi-Velez, Jim Fackler, David Kale, Rajesh Ranganath, Byron Wallace, and Jenna Wiens, editors, Proceedings of the 2nd Machine Learning for Healthcare Conference, volume 68 of Proceedings of Machine Learning Research, pages 147–163. PMLR, 18–19 Aug 2017a. URL https://proceedings.mlr.press/v68/raghu17a.html.
  4. Transatlantic transferability of a new reinforcement learning model for optimizing haemodynamic treatment for critically ill patients with sepsis. Artificial Intelligence in Medicine, 112:102003, 2021. ISSN 0933-3657. doi:https://doi.org/10.1016/j.artmed.2020.102003. URL https://www.sciencedirect.com/science/article/pii/S0933365720312689.
  5. Effective treatment recommendations for type 2 diabetes management using reinforcement learning: Treatment recommendation model development and validation. J Med Internet Res, 23(7):e27858, Jul 2021. ISSN 1438-8871. doi:10.2196/27858. URL https://www.jmir.org/2021/7/e27858.
  6. Personalized multimorbidity management for patients with type 2 diabetes using reinforcement learning of electronic health records. Drugs, 4:471–482, 2021. doi:https://doi.org/10.1007/s40265-020-01435-4.
  7. Deep reinforcement learning for automated radiation adaptation in lung cancer. Medical Physics, 44:6690–6705, 2017. doi:10.1002/mp.12625.
  8. Batch reinforcement learning. Reinforcement learning: State-of-the-art, pages 45–73, 2012.
  9. Deep reinforcement learning with double q-learning. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), Mar. 2016. doi:10.1609/aaai.v30i1.10295. URL https://ojs.aaai.org/index.php/AAAI/article/view/10295.
  10. Is deep reinforcement learning ready for practical applications in healthcare? a sensitivity analysis of duel-DDQN for hemodynamic management in sepsis patients. AMIA Annu. Symp. Proc., 2020:773–782, 2020.
  11. Improving sepsis treatment strategies by combining deep and kernel-based reinforcement learning. AMIA Annu. Symp. Proc., 2018:887–896, December 2018.
  12. Deep inverse reinforcement learning for sepsis treatment. In 2019 IEEE International Conference on Healthcare Informatics (ICHI), pages 1–3, 2019. doi:10.1109/ICHI.2019.8904645.
  13. Basal glucose control in type 1 diabetes using deep reinforcement learning: An in silico validation. IEEE Journal of Biomedical and Health Informatics, 25(4):1223–1232, 2021. doi:10.1109/JBHI.2020.3014556.
  14. Stabilizing off-policy q-learning via bootstrapping error reduction. Advances in Neural Information Processing Systems, 32, 2019a.
  15. An optimistic perspective on offline reinforcement learning. In International Conference on Machine Learning, pages 104–114. PMLR, 2020.
  16. Diagnosing bottlenecks in deep q-learning algorithms. In International Conference on Machine Learning, pages 2021–2030. PMLR, 2019.
  17. Conservative q-learning for offline reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 1179–1191. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/0d2b2061826a5df3221116a5085a6052-Paper.pdf.
  18. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 33:14129–14142, 2020a.
  19. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2016a. doi:10.1109/EMBC.2016.7591355.
  20. Clinical inertia in response to inadequate glycemic control: do specialists differ from primary care physicians? Diabetes Care, 28(3):600–606, March 2005.
  21. Reinforcement learning in healthcare: a survey. ACM Computing Surveys, 33(1), 2021a.
  22. Deep reinforcement learning for sepsis treatment, 2017b. URL https://arxiv.org/abs/1711.09602.
  23. Controlling level of unconsciousness by titrating propofol with deep reinforcement learning. AI in Medicine, 2020.
  24. Supervised-actor-critic reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units. BMC medical informatics and decision making, 2020b.
  25. Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning, 2021. URL https://arxiv.org/abs/2106.03400.
  26. Exponentially weighted imitation learning for batched historical data. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/file/4aec1b3435c52abbdf8334ea0e7141e0-Paper.pdf.
  27. Bail: Best-action imitation learning for batch deep reinforcement learning. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 18353–18363. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/file/d55cbf210f175f4a37916eafe6c04f0d-Paper.pdf.
  28. Uncertainty weighted actor-critic for offline reinforcement learning. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 11319–11328. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/wu21i.html.
  29. Offline reinforcement learning with fisher divergence critic regularization. In International Conference on Machine Learning, 2021.
  30. Combo: Conservative offline model-based policy optimization. Advances in neural information processing systems, 34:28954–28967, 2021b.
  31. Off-policy deep reinforcement learning without exploration. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors, Proceedings of the 36th International Conference on Machine Learning, volume 97 of Proceedings of Machine Learning Research, pages 2052–2062. PMLR, 09–15 Jun 2019. URL https://proceedings.mlr.press/v97/fujimoto19a.html.
  32. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction. Curran Associates Inc., Red Hook, NY, USA, 2019b.
  33. Behavior regularized offline reinforcement learning, 2019. URL https://arxiv.org/abs/1911.11361.
  34. Semi-markov offline reinforcement learning for healthcare, 2022. URL https://arxiv.org/abs/2203.09365.
  35. Medical dead-ends and learning to identify high-risk states and treatments. In M. Ranzato, A. Beygelzimer, Y. Dauphin, P.S. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, volume 34, pages 4856–4870. Curran Associates, Inc., 2021. URL https://proceedings.neurips.cc/paper/2021/file/26405399c51ad7b13b504e74eb7c696c-Paper.pdf.
  36. An empirical study of representation learning for reinforcement learning in healthcare, 2020. URL https://arxiv.org/abs/2011.11235.
  37. Mildly conservative q-learning for offline reinforcement learning, 2022. URL https://arxiv.org/abs/2206.04745.
  38. Addressing the curse of imbalanced training sets: One sided selection. In Proceedings of the 14th International Conference on Machine Learning, pages 179–186. Morgan Kaufman, 1997.
  39. Data mining for direct marketing: Problems and solutions. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining, KDD’98, page 73–79. AAAI Press, 1998.
  40. A.H. Schistad Solberg and R. Solberg. A large-scale evaluation of features for automatic detection of oil spills in ers sar images. In IGARSS ’96. 1996 International Geoscience and Remote Sensing Symposium, volume 3, pages 1484–1486 vol.3, 1996. doi:10.1109/IGARSS.1996.516705.
  41. Smote: Synthetic minority over-sampling technique. J. Artif. Int. Res., 16(1):321–357, jun 2002. ISSN 1076-9757.
  42. Adasyn: Adaptive synthetic sampling approach for imbalanced learning. In 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), pages 1322–1328, 2008. doi:10.1109/IJCNN.2008.4633969.
  43. C.J.C.H. Watkins. Learning from Delayed Rewards. PhD thesis, University of Cambridge England, 1989.
  44. American Diabetes Association. Standards of Medical Care in Diabetes—2022 Abridged for Primary Care Providers. Clinical Diabetes, 40(1):10–38, 01 2022. ISSN 0891-8929. doi:10.2337/cd22-as01. URL https://doi.org/10.2337/cd22-as01.
  45. Establishment of the singhealth diabetes registry. Clinical Epidemiology, 13:215–223, 2021. doi:https://doi.org/10.2147/CLEP.S300663.
  46. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med, 24, Nov. 2018. doi:10.1038/s41591-018-0213-5. URL https://www.nature.com/articles/s41591-018-0213-5.
  47. Mimic-iii, a freely accessible critical care database. 3, May. 2016. doi:10.1038/sdata.2016.35. URL https://www.nature.com/articles/sdata201635.
  48. Tim Hesterberg. Weighted average importance sampling and defensive mixture distributions. Technometrics, 37(2):185–194, 1995. ISSN 00401706. URL http://www.jstor.org/stable/1269620.
  49. A deep deterministic policy gradient approach to medication dosing and surveillance in the icu. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 4927–4931, 2018. doi:10.1109/EMBC.2018.8513203.
  50. Optimal medication dosing from suboptimal clinical examples: A deep reinforcement learning approach. In 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 2978–2981, 2016b. doi:10.1109/EMBC.2016.7591355.
  51. Mixed methods sampling: A typology with examples. Journal of Mixed Methods Research, 1(1):77–100, 2007. doi:10.1177/1558689806292430. URL https://doi.org/10.1177/1558689806292430.
  52. Towards automated imbalanced learning with deep hierarchical reinforcement learning, 2022.
  53. Glenn W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78:1–3, 1950.
Citations (15)

Summary

We haven't generated a summary for this paper yet.