Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 91 tok/s
Gemini 2.5 Pro 56 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 29 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 214 tok/s Pro
GPT OSS 120B 470 tok/s Pro
Claude Sonnet 4 40 tok/s Pro
2000 character limit reached

A Pragmatic Look at Deep Imitation Learning (2108.01867v2)

Published 4 Aug 2021 in cs.LG, cs.NE, and stat.ML

Abstract: The introduction of the generative adversarial imitation learning (GAIL) algorithm has spurred the development of scalable imitation learning approaches using deep neural networks. Many of the algorithms that followed used a similar procedure, combining on-policy actor-critic algorithms with inverse reinforcement learning. More recently there have been an even larger breadth of approaches, most of which use off-policy algorithms. However, with the breadth of algorithms, everything from datasets to base reinforcement learning algorithms to evaluation settings can vary, making it difficult to fairly compare them. In this work we re-implement 6 different IL algorithms, updating 3 of them to be off-policy, base them on a common off-policy algorithm (SAC), and evaluate them on a widely-used expert trajectory dataset (D4RL) for the most common benchmark (MuJoCo). After giving all algorithms the same hyperparameter optimisation budget, we compare their results for a range of expert trajectories. In summary, GAIL, with all of its improvements, consistently performs well across a range of sample sizes, AdRIL is a simple contender that performs well with one important hyperparameter to tune, and behavioural cloning remains a strong baseline when data is more plentiful.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (62)
  1. Deep Reinforcement Learning at the Edge of the Statistical Precipice. In NeurIPS, 2021.
  2. What Matters in On-policy Reinforcement Learning? A Large-scale Empirical Study. In ICLR, 2021.
  3. A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress. Artif. Intell., 297:103500, 2021.
  4. Deep Reinforcement Learning: A Brief Survey. IEEE SPM, 34(6):26–38, 2017.
  5. BoTorch: A Framework for Efficient Monte-Carlo Bayesian Optimization. In NeurIPS, 2020.
  6. Sample-efficient Imitation Learning via Generative Adversarial Nets. In AISTATS, 2019.
  7. Lipschitzness is All You Need to Tame Off-policy Generative Adversarial Imitation Learning. Mach. Learn., 111(4):1431–1521, 2022.
  8. Disagreement-regularized Imitation Learning. In ICLR, 2020.
  9. OpenAI Gym. arXiv:1606.01540, 2016.
  10. Exploration by Random Network Distillation. In ICLR, 2018.
  11. Batch Exploration with Examples for Scalable Robotic Reinforcement Learning. IEEE RA-L, 6(3):4401–4408, 2021.
  12. Primal Wasserstein Imitation Learning. In ICLR, 2021.
  13. Search-based Structured Prediction. Mach. Learn., 75(3):297–325, 2009.
  14. Training Generative Neural Networks via Maximum Mean Discrepancy Optimization. In UAI, 2015.
  15. Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO. In ICLR, 2020.
  16. A Connection Between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-based Models. arXiv:1611.03852, 2016.
  17. Learning Robust Rewards with Adversarial Inverse Reinforcement Learning. In ICLR, 2018.
  18. D4RL: Datasets for Deep Data-driven Reinforcement Learning. arXiv:2004.07219, 2020.
  19. Addressing Function Approximation Error in Actor-critic Methods. In ICML, 2018.
  20. A Divergence Minimization Perspective on Imitation Learning Methods. In CoRL, 2020.
  21. Generative Adversarial Networks. In NeurIPS, 2014.
  22. A Kernel Two-sample Test. JMLR, 13(1):723–773, 2012.
  23. Soft Actor-critic: Off-policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In ICML, 2018a.
  24. Soft Actor-critic Algorithms and Applications. arXiv:1812.05905, 2018b.
  25. Deep Reinforcement Learning that Matters. In AAAI, 2018.
  26. Generative Adversarial Imitation Learning. In NeurIPS, 2016.
  27. Imitation Learning: A Survey of Learning Methods. ACM CSUR, 50(2):1–35, 2017.
  28. Hyperparameter Selection for Imitation Learning. In ICML, 2021.
  29. Edwin T Jaynes. Information Theory and Statistical Mechanics. Phys. Rev., 106(4):620, 1957.
  30. Addressing Reward Bias in Adversarial Imitation Learning with Neutral Reward Functions. In Deep RL Workshop, NeurIPS, 2020.
  31. Imitation Learning via Kernel Mean Embedding. In AAAI, 2018.
  32. Discriminator-actor-critic: Addressing Sample Inefficiency and Reward Bias in Adversarial Imitation Learning. In ICLR, 2019.
  33. Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles. In NeurIPS, 2017.
  34. Generative Moment Matching Networks. In ICML, 2015.
  35. Long-Ji Lin. Self-improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching. Mach. Learn., 8(3-4):293–321, 1992.
  36. Decoupled Weight Decay Regularization. In ICLR, 2019.
  37. Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents. JAIR, 61:523–562, 2018.
  38. Alfred Müller. Integral Probability Metrics and Their Generating Classes of Functions. Adv. Appl. Probab., 29(2):429–443, 1997.
  39. A Metric Learning Reality Check. In ECCV, 2020.
  40. Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping. In ICML, 1999.
  41. Algorithms for Inverse Reinforcement Learning. In ICML, 2000.
  42. Realistic Evaluation of Deep Semi-supervised Learning Algorithms. In NeurIPS, 2018.
  43. What Matters for Adversarial Imitation Learning? In NeurIPS, 2021.
  44. Time Limits in Reinforcement Learning. In ICML, 2018.
  45. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In NeurIPS, 2019.
  46. Boosted and Reward-regularized Classification for Apprenticeship Learning. In AAMAS, 2014.
  47. Dean A Pomerleau. ALVINN: An Autonomous Land Vehicle in a Neural Network. In NeurIPS, 1988.
  48. Stable-baselines3: Reliable Reinforcement Learning Implementations. JMLR, 22(1):12348–12355, 2021.
  49. SQIL: Imitation Learning via Reinforcement Learning with Sparse Rewards. In ICLR, 2020.
  50. Efficient Reductions for Imitation Learning. In AISTATS, 2010.
  51. A Reduction of Imitation Learning and Structured Prediction to No-regret Online Learning. In AISTATS, 2011.
  52. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. JMLR, 15(1):1929–1958, 2014.
  53. Reinforcement Learning: An Introduction. MIT Press, 2018.
  54. Of Moments and Matching: A Game-theoretic Framework for Closing the Imitation Gap. In ICML, 2021.
  55. Apprenticeship Learning using Linear Programming. In ICML, 2008.
  56. MuJoCo: A Physics Engine for Model-based Control. In IROS, 2012.
  57. Cédric Villani. Optimal Transport: Old and New. Springer, 2009.
  58. Random Expert Distillation: Imitation Learning via Expert Policy Support Estimation. In ICML, 2019.
  59. Function Optimization using Connectionist Reinforcement Learning Algorithms. Conn. Sci., 3(3):241–268, 1991.
  60. Positive-unlabeled Reward Learning. In CoRL, 2021.
  61. Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation. In CVPR, 2017.
  62. Maximum Entropy Inverse Reinforcement Learning. In AAAI, 2008.
Citations (8)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.