Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 33 tok/s Pro
GPT-5 High 33 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 180 tok/s Pro
GPT OSS 120B 435 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video (1910.09430v2)

Published 21 Oct 2019 in cs.CV, cs.LG, and cs.RO

Abstract: Key challenges for the deployment of reinforcement learning (RL) agents in the real world are the discovery, representation and reuse of skills in the absence of a reward function. To this end, we propose a novel approach to learn a task-agnostic skill embedding space from unlabeled multi-view videos. Our method learns a general skill embedding independently from the task context by using an adversarial loss. We combine a metric learning loss, which utilizes temporal video coherence to learn a state representation, with an entropy regularized adversarial skill-transfer loss. The metric learning loss learns a disentangled representation by attracting simultaneous viewpoints of the same observations and repelling visually similar frames from temporal neighbors. The adversarial skill-transfer loss enhances re-usability of learned skill embeddings over multiple task domains. We show that the learned embedding enables training of continuous control policies to solve novel tasks that require the interpolation of previously seen skills. Our extensive evaluation with both simulation and real world data demonstrates the effectiveness of our method in learning transferable skills from unlabeled interaction videos and composing them for new tasks. Code, pretrained models and dataset are available at http://robotskills.cs.uni-freiburg.de

Citations (30)

Summary

  • The paper introduces a novel ASN framework that creates a generalized, task-agnostic skill embedding space using adversarial loss and metric learning.
  • The methodology leverages a GAN-like encoder-discriminator setup to extract and differentiate skill representations from multi-view videos.
  • Experimental results show improved policy performance and transferable skill learning in complex tasks such as multi-object manipulation using PPO.

Adversarial Skill Networks: Unsupervised Robot Skill Learning from Video

Adversarial Skill Networks (ASN) introduce a novel method for learning task-agnostic, transferable skill embeddings from unlabeled multi-view videos, targeting applications in unsupervised robot skill acquisition. The framework leverages the comprehensive power of adversarial networks combined with metric learning to facilitate the representation, discovery, and reuse of skills in a reinforcement learning (RL) context without relying on task-specific reward functions.

Methodology

Skill Embedding Space

ASNs use an adversarial loss framework to create a generalized skill embedding space. The adversarial component leverages a two-network setup: an encoder extracts skill representations from video frames, while a discriminator aims to distinguish the task origin of these skills based on learned embeddings. Both networks operate in a manner similar to GANs—maximizing and minimizing entropy, respectively—to foster a robust and versatile embedding space that is indifferent to specific task identifiers. Figure 1

Figure 1: Given the demonstration of a new task as input, Adversarial Skill Networks yield a distance measure in skill-embedding space which can be used as the reward signal for a reinforcement learning agent for multiple tasks.

Metric Learning and Adversarial Loss

The approach combines innovative metric learning with lifted structure losses, leveraging temporal video coherence to refine state representation effectively. Temporal framing of videos into sequential skills, set apart by time delays, is used to construct a skill embedding where simultaneous states are pulled together, and temporally distant states are pushed apart, improving general task representation.

Network Architecture and Training

The encoder architecture, inspired by TCN structures, utilizes pre-trained ImageNet weights followed by convolutional layers, a spatial softmax layer, and a fully connected layer to derive lower-dimensioned embeddings. The discriminator comprises fully connected layers that produce a probabilistic distribution over potential task origins.

A critical design consideration is the use of KL-divergence to maintain embedding coherence when considering higher temporal strides, ensuring that macro-action representations are distinct yet adaptable across task scenarios.

Experimental Evaluation

Qualitative and Quantitative Testing

ASN's embedding models were rigorously tested across simulated and real-world datasets involving complex tasks, including multi-object stacking and manipulation. Evaluations relied on alignment loss metrics to assess skill coherence in embedding spaces, highlighting ASN's superior transfer learning capability compared to baselines such as TCNs. Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2

Figure 2: Real block tasks

Policy Learning

The ASN framework's efficacy in skill transferability was further evidenced in continuous policy learning scenarios, leveraging learned embeddings as a reward proxy in PPO environments. ASN models successfully interpolated learned skills to achieve strong policy performance in novel tasks requiring unseen skill compositions. Figure 3

Figure 3

Figure 3: Results for training a continuous control policy with PPO on the unseen Color Pushing and Color Stacking tasks with the learned reward function. The plot shows mean and standard deviation over five training runs.

Conclusion

Adversarial Skill Networks provide a robust, unsupervised framework for learning and reusing generalized skill representations across varied tasks without requiring explicit reward designs. This method presents significant implications for extending RL and imitation learning towards real-world applications with minimal supervision, opening new avenues for scalable and adaptive robotic learning systems. Future work could explore ASN applications within sim-to-real settings and dynamically complex environments that require a higher degree of skill interpolation.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Youtube Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube