Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 126 tok/s Pro
Kimi K2 191 tok/s Pro
GPT OSS 120B 430 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Rethinking Adversarial Inverse Reinforcement Learning: Policy Imitation, Transferable Reward Recovery and Algebraic Equilibrium Proof (2403.14593v4)

Published 21 Mar 2024 in cs.LG and stat.ML

Abstract: Adversarial inverse reinforcement learning (AIRL) stands as a cornerstone approach in imitation learning, yet it faces criticisms from prior studies. In this paper, we rethink AIRL and respond to these criticisms. Criticism 1 lies in Inadequate Policy Imitation. We show that substituting the built-in algorithm with soft actor-critic (SAC) during policy updating (requires multi-iterations) significantly enhances the efficiency of policy imitation. Criticism 2 lies in Limited Performance in Transferable Reward Recovery Despite SAC Integration. While we find that SAC indeed exhibits a significant improvement in policy imitation, it introduces drawbacks to transferable reward recovery. We prove that the SAC algorithm itself is not feasible to disentangle the reward function comprehensively during the AIRL training process, and propose a hybrid framework, PPO-AIRL + SAC, for a satisfactory transfer effect. Criticism 3 lies in Unsatisfactory Proof from the Perspective of Potential Equilibrium. We reanalyze it from an algebraic theory perspective.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Dean A Pomerleau. Efficient training of artificial neural networks for autonomous navigation. Neural Computation, 3(1):88–97, 1991.
  2. Algorithms for inverse reinforcement learning. In International Conference on Machine Learning, pages 663–670, 2000.
  3. A game-theoretic approach to apprenticeship learning. In Advances in Neural Information Processing Systems, volume 20, pages 1449–1456, 2007.
  4. Generative adversarial imitation learning. In Advances in Neural Information Processing Systems, volume 29, pages 4565–4573, 2016.
  5. Martin L Puterman. Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, 2014.
  6. Reinforcement learning: An introduction. Cambridge, MA, USA: MIT Press, 2018.
  7. Multi-agent imitation learning for driving simulation. In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 1534–1539. IEEE, 2018.
  8. Mohamed Khalil Jabri. Robot manipulation learning using generative adversarial imitation learning. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI 2021, pages 4893–4894, 2021.
  9. Virtual-taobao: Virtualizing real-world online retail environment for reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 4902–4909, 2019.
  10. Policy invariance under reward transformations: Theory and application to reward shaping. In International Conference on Machine Learning, pages 278–287, 1999.
  11. Learning robust rewards with adversarial inverse reinforcement learning. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1710.11248, 2018.
  12. Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905, 2018.
  13. Trust region policy optimization. In International Conference on Machine Learning, pages 1889–1897, 2015.
  14. Generalization and computation for policy classes of generative adversarial imitation learning. In International Conference on Parallel Problem Solving from Nature, pages 385–399. Springer, 2022.
  15. Distributional generative adversarial imitation learning with reproducing kernel generalization. Neural Networks, 165:43–59, 2023.
  16. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, pages 1861–1870, 2018.
  17. Reward identification in inverse reinforcement learning. In International Conference on Machine Learning, pages 5496–5505, 2021.
  18. Identifiability in inverse reinforcement learning. In Advances in Neural Information Processing Systems, volume 34, pages 12362–12373, 2021.
  19. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
  20. Discriminator-actor-critic: Addressing sample inefficiency and reward bias in adversarial imitation learning. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1809.02925, 2019.
  21. Imitation learning via off-policy distribution matching. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1912.05032, 2020.
  22. A coupled flow approach to imitation learning. In International Conference on Machine Learning, pages 10357–10372, 2023.
  23. Receding horizon inverse reinforcement learning. In Advances in Neural Information Processing Systems, volume 35, pages 27880–27892, 2022.
  24. Generative adversarial nets. In Advances in Neural Information Processing Systems, volume 27, pages 2672––2680, 2014.
  25. DSAC: Distributional soft actor critic for risk-sensitive reinforcement learning. arXiv preprint arXiv:2004.14547, 2020.
  26. Distributional soft actor-critic: Off-policy reinforcement learning for addressing value estimation errors. IEEE Transactions on Neural Networks and Learning Systems, 33(11):6584–6598, 2022.
  27. A framework for behavioural cloning. Machine Intelligence 15, pages 103–129, 1995.
  28. Adversarial imitation via variational inverse reinforcement learning. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:1809.06404, 2019.
  29. OPIRL: Sample efficient off-policy inverse reinforcement learning via distribution matching. In International Conference on Robotics and Automation (ICRA), pages 448–454. IEEE, 2022.
  30. BC-IRL: Learning generalizable reward functions from demonstrations. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:2303.16194, 2023.
  31. Variational information maximisation for intrinsically motivated reinforcement learning. In Advances in Neural Information Processing Systems, volume 28, pages 2125–2133, 2015.
  32. State-only imitation with transition dynamics mismatch. In International Conference on Learning Representations (ICLR). Preprint retrieved from arXiv:2002.11879, 2020.
  33. Identifiability and generalizability from multiple experts in inverse reinforcement learning. In Advances in Neural Information Processing Systems, volume 35, pages 550–564, 2022.
  34. Identifiability and generalizability in constrained inverse reinforcement learning. In International Conference on Machine Learning, pages 30224–30251, 2023.
  35. f-irl: Inverse reinforcement learning via state marginal matching. In Conference on Robot Learning, pages 529–551. PMLR, 2020.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 4 tweets and received 5 likes.

Upgrade to Pro to view all of the tweets about this paper:

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube