TAIL: Task-specific Adapters for Imitation Learning with Large Pretrained Models (2310.05905v2)
Abstract: The full potential of large pretrained models remains largely untapped in control domains like robotics. This is mainly because of the scarcity of data and the computational challenges associated with training or fine-tuning these large models for such applications. Prior work mainly emphasizes either effective pretraining of large models for decision-making or single-task adaptation. But real-world problems will require data-efficient, continual adaptation for new control tasks. Recognizing these constraints, we introduce TAIL (Task-specific Adapters for Imitation Learning), a framework for efficient adaptation to new control tasks. Inspired by recent advancements in parameter-efficient fine-tuning in language domains, we explore efficient fine-tuning techniques -- e.g., Bottleneck Adapters, P-Tuning, and Low-Rank Adaptation (LoRA) -- in TAIL to adapt large pretrained models for new tasks with limited demonstration data. Our extensive experiments in large-scale language-conditioned manipulation tasks comparing prevalent parameter-efficient fine-tuning techniques and adaptation baselines suggest that TAIL with LoRA can achieve the best post-adaptation performance with only 1\% of the trainable parameters of full fine-tuning, while avoiding catastrophic forgetting and preserving adaptation plasticity in continual learning settings.
- Reincarnating reinforcement learning: Reusing prior computation to accelerate progress. Advances in Neural Information Processing Systems, 35:28955–28971, 2022.
- Intrinsic dimensionality explains the effectiveness of language model fine-tuning. arXiv preprint arXiv:2012.13255, 2020.
- Christopher M Bishop. Mixture density networks. 1994.
- Robocat: A self-improving foundation agent for robotic manipulation. arXiv preprint arXiv:2306.11706, 2023.
- The influence of pattern similarity and transfer learning upon training of a base perceptron b2. In Proceedings of Symposium Informatica, volume 3, pp. 121–126, 1976.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. In arXiv preprint arXiv:2307.15818, 2023.
- Language models are few-shot learners, 2020.
- Task-agnostic continual reinforcement learning: Gaining insights and overcoming challenges. In Conference on Lifelong Learning Agents, 2023.
- On tiny episodic memories in continual learning. arXiv preprint arXiv:1902.10486, 2019.
- Actionable models: Unsupervised offline reinforcement learning of robotic skills. arXiv preprint arXiv:2104.07749, 2021.
- Context-aware safe reinforcement learning for non-stationary environments. In 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 10689–10695. IEEE, 2021a.
- Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021b.
- Adaptformer: Adapting vision transformers for scalable visual recognition. Advances in Neural Information Processing Systems, 35:16664–16678, 2022.
- Open X-Embodiment: Robotic learning datasets and RT-X models. https://robotics-transformer-x.github.io, 2023.
- Special issue on inductive transfer. Machine Learning, 28(1), 1997.
- Palm-e: An embodied multimodal language model. In arXiv preprint arXiv:2303.03378, 2023.
- Meta-q-learning. In International Conference on Learning Representations, 2020.
- Model-based lifelong reinforcement learning with bayesian exploration. Advances in Neural Information Processing Systems, 35:32369–32382, 2022.
- Ego4d: Around the world in 3,000 hours of egocentric video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 18995–19012, 2022.
- Demonstration-bootstrapped autonomous practicing via multi-task reinforcement learning. arXiv, 2022.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. arXiv preprint arXiv:2212.05749, 2022.
- Towards a unified view of parameter-efficient transfer learning. In International Conference on Learning Representations, 2022.
- Parameter-efficient transfer learning for nlp. In International Conference on Machine Learning, pp. 2790–2799. PMLR, 2019.
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
- BC-z: Zero-shot task generalization with robotic imitation learning. In 5th Annual Conference on Robot Learning, 2021.
- Vima: General robot manipulation with multimodal prompts. In Fortieth International Conference on Machine Learning, 2023.
- Mt-opt: Continuous multi-task robotic reinforcement learning at scale. arXiv, 2021.
- Imitation learning as f𝑓fitalic_f-divergence minimization. arXiv preprint 1905.12888, 2020.
- Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, 114(13):3521–3526, 2017.
- Fine-tuning can distort pretrained features and underperform out-of-distribution. In International Conference on Learning Representations, 2022.
- Maintaining plasticity via regenerative regularization. arXiv preprint arXiv:2308.11958, 2023.
- Surgical fine-tuning improves adaptation to distribution shifts. International Conference on Learning Representations, 2023.
- The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021.
- Prefix-tuning: Optimizing continuous prompts for generation. arXiv preprint arXiv:2101.00190, 2021.
- Transformer adapters for robot learning. In CoRL 2022 Workshop on Pre-training Robot Learning, 2022.
- Libero: Benchmarking knowledge transfer for lifelong robot learning. arXiv preprint arXiv:2306.03310, 2023a.
- Gpt understands, too. AI Open, 2023b.
- Constrained decision transformer for offline safe reinforcement learning. arXiv preprint arXiv:2302.07351, 2023c.
- Gradient episodic memory for continual learning. In Advances in Neural Information Processing Systems (NIPS), 2017.
- Aw-opt: Learning robotic skills with imitation andreinforcement at scale. In 5th Annual Conference on Robot Learning, 2021.
- Understanding and preventing capacity loss in reinforcement learning. In International Conference on Learning Representations, 2022.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022.
- Liv: Language-image representations and rewards for robotic control. arXiv preprint arXiv:2306.00958, 2023.
- Where are we in the search for an artificial visual cortex for embodied intelligence? 2023a.
- Where are we in the search for an artificial visual cortex for embodied intelligence? arXiv preprint arXiv:2303.18240, 2023b.
- Packnet: Adding multiple tasks to a single network by iterative pruning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018.
- What matters in learning from offline human demonstrations for robot manipulation. arXiv preprint arXiv:2108.03298, 2021.
- Unipelt: A unified framework for parameter-efficient language model tuning. 2022.
- Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of learning and motivation, volume 24, pp. 109–165. Elsevier, 1989.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022.
- Film: Visual reasoning with a general conditioning layer, 2017.
- Adapterfusion: Non-destructive task composition for transfer learning. arXiv preprint arXiv:2005.00247, 2020a.
- Adapterhub: A framework for adapting transformers. arXiv preprint arXiv:2007.07779, 2020b.
- Language models are unsupervised multitask learners. 2019a.
- Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019b.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Efficient parametrization of multi-domain deep neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8119–8127, Los Alamitos, CA, USA, jun 2018. IEEE Computer Society. doi: 10.1109/CVPR.2018.00847. URL https://doi.ieeecomputersociety.org/10.1109/CVPR.2018.00847.
- A generalist agent. Transactions on Machine Learning Research, 2022. ISSN 2835-8856. Featured Certification, Outstanding Certification.
- Experience replay for continual learning. Advances in Neural Information Processing Systems, 32, 2019.
- A reduction of imitation learning and structured prediction to no-regret online learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 of Proceedings of Machine Learning Research, pp. 627–635. PMLR, 11–13 Apr 2011.
- Jürgen Schmidhuber. Learning complex, extended sequences using the principle of history compression. Neural Computation, 4(2):234–242, 1992. doi: 10.1162/neco.1992.4.2.234.
- Learning to modulate pre-trained models in rl. arXiv preprint arXiv:2306.14884, 2023.
- Behavior transformers: Cloning $k$ modes with one stone. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022.
- Lossless adaptation of pretrained vision models for robotic manipulation. arXiv preprint arXiv:2304.06600, 2023.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pp. 894–906. PMLR, 2022.
- Lifelong robot learning. In The biology and technology of intelligent autonomous agents, pp. 165–196. Springer, 1995.
- Llama: Open and efficient foundation language models, 2023.
- Discorl: Continual reinforcement learning via policy distillation. CoRR, abs/1907.05855, 2019.
- Attention is all you need. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Prompting decision transformer for few-shot policy generalization. In international conference on machine learning, pp. 24631–24645. PMLR, 2022.
- Hyper-decision transformer for efficient online policy adaptation. arXiv preprint arXiv:2304.08487, 2023.
- Constraint-conditioned policy optimization for versatile safe reinforcement learning. Advances in Neural Information Processing Systems, 36, 2024.
- Sprint: Scalable policy pre-training via language instruction relabeling. arXiv preprint arXiv:2306.11886, 2023a.
- Bootstrap your own skills: Learning to solve new tasks with large language model guidance. In 7th Annual Conference on Robot Learning, 2023b.