Sample-efficient Adversarial Imitation Learning (2303.07846v2)
Abstract: Imitation learning, in which learning is performed by demonstration, has been studied and advanced for sequential decision-making tasks in which a reward function is not predefined. However, imitation learning methods still require numerous expert demonstration samples to successfully imitate an expert's behavior. To improve sample efficiency, we utilize self-supervised representation learning, which can generate vast training signals from the given data. In this study, we propose a self-supervised representation-based adversarial imitation learning method to learn state and action representations that are robust to diverse distortions and temporally predictive, on non-image control tasks. In particular, in comparison with existing self-supervised learning methods for tabular data, we propose a different corruption method for state and action representations that is robust to diverse distortions. We theoretically and empirically observe that making an informative feature manifold with less sample complexity significantly improves the performance of imitation learning. The proposed method shows a 39% relative improvement over existing adversarial imitation learning methods on MuJoCo in a setting limited to 100 expert state-action pairs. Moreover, we conduct comprehensive ablations and additional experiments using demonstrations with varying optimality to provide insights into a range of factors.
- Apprenticeship learning via inverse reinforcement learning. In Proceedings of the twenty-first international conference on Machine learning, page 1, 2004.
- Deep reinforcement learning at the edge of the statistical precipice. Advances in Neural Information Processing Systems, 34, 2021.
- A survey of inverse reinforcement learning: Challenges, methods and progress. Artificial Intelligence, page 103500, 2021.
- Scarf: Self-supervised contrastive learning using random feature corruption. arXiv preprint arXiv:2106.15147, 2021.
- Adversarial soft advantage fitting: Imitation learning without policy optimization. In NeurIPS, 2020.
- Vicreg: Variance-invariance-covariance regularization for self-supervised learning. arXiv preprint arXiv:2105.04906, 2021.
- Sirl: Similarity-based implicit representation learning. In Proceedings of the 2023 ACM/IEEE International Conference on Human-Robot Interaction, pages 565–574, 2023.
- Lof: identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 93–104, 2000.
- Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- Safe imitation learning via fast bayesian reward inference from preferences. In International Conference on Machine Learning, pages 1165–1177. PMLR, 2020.
- Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882, 2020.
- Mixtext: Linguistically-informed interpolation of hidden space for semi-supervised text classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 2147–2157, 2020a.
- On computation and generalization of generative adversarial imitation learning. In International Conference on Learning Representations, 2019.
- A simple framework for contrastive learning of visual representations. In International conference on machine learning, pages 1597–1607. PMLR, 2020b.
- Exploring simple siamese representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15750–15758, 2021.
- Supermix: Supervising the mixing data augmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 13794–13803, 2021.
- A kernel theory of modern data augmentation. In International Conference on Machine Learning, pages 1528–1537. PMLR, 2019.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Learning robust rewards with adverserial inverse reinforcement learning. In International Conference on Learning Representations, 2018.
- Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
- Bootstrap your own latent: A new approach to self-supervised learning. In Neural Information Processing Systems, 2020.
- Augmenting data with mixup for sentence classification: An empirical study. arXiv preprint arXiv:1905.08941, 2019.
- Dream to control: Learning behaviors by latent imagination. In International Conference on Learning Representations, 2019.
- Self-supervised policy adaptation during deployment. In International Conference on Learning Representations, 2020.
- Stabilizing deep q-learning with convnets and vision transformers under data augmentation. Advances in Neural Information Processing Systems, 34, 2021.
- Momentum contrast for unsupervised visual representation learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9729–9738, 2020.
- Data augmentation revisited: Rethinking the distribution gap between clean and augmented data. arXiv preprint arXiv:1909.09148, 2019.
- Support vector machines. IEEE Intelligent Systems and their applications, 13(4):18–28, 1998.
- Augmix: A simple data processing method to improve robustness and uncertainty. In International Conference on Learning Representations, 2019.
- Generative adversarial imitation learning. Advances in neural information processing systems, 29:4565–4573, 2016.
- Inference aided reinforcement learning for incentive mechanism design in crowdsourcing. Advances in Neural Information Processing Systems, 31:5507–5517, 2018.
- Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348, 2021.
- Kacper Piotr Kielak. Do recent advancements in model-based deep reinforcement learning really improve data efficiency?, 2020.
- Co-mixup: Saliency guided joint mixup with supermodular diversity. In International Conference on Learning Representations, 2020.
- Reinforcement learning with augmented data. Advances in Neural Information Processing Systems, 33, 2020.
- Stochastic latent actor-critic: Deep reinforcement learning with a latent variable model. Advances in Neural Information Processing Systems, 33, 2020.
- i-mix: A domain-agnostic strategy for contrastive representation learning. 2021.
- Does self-supervised learning really improve reinforcement learning from pixels? In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022.
- Charting the right manifold: Manifold mixup for few-shot learning. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 2218–2227, 2020.
- Deep reinforcement and infomax learning. Advances in Neural Information Processing Systems, 33, 2020.
- Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781, 2013.
- Variational information maximisation for intrinsically motivated reinforcement learning. Advances in Neural Information Processing Systems, 28:2125–2133, 2015.
- Learning from distributions via support measure machines. In Proceedings of the 25th International Conference on Neural Information Processing Systems-Volume 1, pages 10–18, 2012.
- Unsupervised learning of visual representations by solving jigsaw puzzles. In European conference on computer vision, pages 69–84. Springer, 2016.
- Pytorch: An imperative style, high-performance deep learning library. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors, Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019.
- Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning, pages 2778–2787. PMLR, 2017.
- Variational discriminator bottleneck: Improving imitation learning, inverse RL, and GANs by constraining information flow. In International Conference on Learning Representations, 2019.
- Dean A Pomerleau. Alvinn: An autonomous land vehicle in a neural network. Technical report, CARNEGIE-MELLON UNIV PITTSBURGH PA ARTIFICIAL INTELLIGENCE AND PSYCHOLOGY …, 1989.
- On variational bounds of mutual information. In International Conference on Machine Learning, pages 5171–5180. PMLR, 2019.
- Adversarial imitation via variational inverse reinforcement learning. In International Conference on Learning Representations, 2018.
- Maximum margin planning. In Proceedings of the 23rd international conference on Machine learning, pages 729–736, 2006.
- Sqil: Imitation learning via reinforcement learning with sparse rewards. In International Conference on Learning Representations, 2019.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the fourteenth international conference on artificial intelligence and statistics, pages 627–635. JMLR Workshop and Conference Proceedings, 2011.
- Trust region policy optimization. In International conference on machine learning, pages 1889–1897. PMLR, 2015.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- Data-efficient reinforcement learning with self-predictive representations. In International Conference on Learning Representations, 2020.
- Reward prediction error as an exploration objective in deep rl. arXiv preprint arXiv:1906.08189, 2019.
- End-to-end robotic reinforcement learning without reward engineering. arXiv preprint arXiv:1904.07854, 2019.
- Curl: Contrastive unsupervised representations for reinforcement learning. arXiv preprint arXiv:2004.04136, 2020.
- Introduction to reinforcement learning, 1998.
- Robust imitation learning from noisy demonstrations. In International Conference on Artificial Intelligence and Statistics, pages 298–306. PMLR, 2021.
- What makes for good views for contrastive learning? arXiv preprint arXiv:2005.10243, 2020.
- Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026–5033. IEEE, 2012.
- Sample-efficient adversarial imitation learning from observation. arXiv preprint arXiv:1906.07374, 2019.
- Vladimir N Vapnik. An overview of statistical learning theory. IEEE transactions on neural networks, 10(5):988–999, 1999.
- Manifold mixup: Better representations by interpolating hidden states. In International Conference on Machine Learning, pages 6438–6447. PMLR, 2019a.
- Manifold mixup: Better representations by interpolating hidden states. In International Conference on Machine Learning, pages 6438–6447. PMLR, 2019b.
- Improving generalization in reinforcement learning with mixture regularization. In NeurIPS, 2020.
- Latent policies for adversarial imitation learning. arXiv preprint arXiv:2206.11299, 2022.
- Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In International Conference on Machine Learning, pages 9929–9939. PMLR, 2020.
- Generalizing reinforcement learning through fusing self-supervised learning into intrinsic motivation. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 8683–8690, 2022.
- Imitation learning from imperfect demonstration. In International Conference on Machine Learning, pages 6818–6827. PMLR, 2019.
- Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. In International Conference on Learning Representations, 2020.
- Vime: Extending the success of self-and semi-supervised learning to tabular domain. Advances in Neural Information Processing Systems, 33, 2020.
- Cutmix: Regularization strategy to train strong classifiers with localizable features. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 6023–6032, 2019.
- Self-supervised reinforcement learning with independently controllable subgoals. In Conference on Robot Learning, pages 384–394. PMLR, 2022.
- Barlow twins: Self-supervised learning via redundancy reduction. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 12310–12320. PMLR, 18–24 Jul 2021.
- Confidence-aware imitation learning from demonstrations with varying optimality. In Thirty-Fifth Conference on Neural Information Processing Systems, 2021.
- f-gail: Learning f-divergence for generative adversarial imitation learning. Advances in neural information processing systems, 2020.
- Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd national conference on Artificial intelligence-Volume 3, pages 1433–1438, 2008a.
- Maximum entropy inverse reinforcement learning. In Aaai, volume 8, pages 1433–1438. Chicago, IL, USA, 2008b.