Papers
Topics
Authors
Recent
2000 character limit reached

Autonomous Open-Ended Learning of Interdependent Tasks (1905.02690v1)

Published 7 May 2019 in cs.AI and cs.RO

Abstract: Autonomy is fundamental for artificial agents acting in complex real-world scenarios. The acquisition of many different skills is pivotal to foster versatile autonomous behaviour and thus a main objective for robotics and machine learning. Intrinsic motivations have proven to properly generate a task-agnostic signal to drive the autonomous acquisition of multiple policies in settings requiring the learning of multiple tasks. However, in real world scenarios tasks may be interdependent so that some of them may constitute the precondition for learning other ones. Despite different strategies have been used to tackle the acquisition of interdependent/hierarchical tasks, fully autonomous open-ended learning in these scenarios is still an open question. Building on previous research within the framework of intrinsically-motivated open-ended learning, we propose an architecture for robot control that tackles this problem from the point of view of decision making, i.e. treating the selection of tasks as a Markov Decision Process where the system selects the policies to be trained in order to maximise its competence over all the tasks. The system is then tested with a humanoid robot solving interdependent multiple reaching tasks.

Citations (2)

Summary

  • The paper introduces a reinforcement learning architecture that treats task selection as an MDP to optimize competency across interdependent tasks.
  • The approach integrates a dual-layer system with a goal selector and actor-critic networks, outperforming previous models in simulated iCub experiments.
  • Results indicate that competence-based intrinsic motivation effectively drives learning efficiency in complex robotic task hierarchies.

Autonomous Open-Ended Learning of Interdependent Tasks

Introduction

The paper "Autonomous Open-Ended Learning of Interdependent Tasks" (1905.02690) addresses a significant challenge in robotic learning: enabling artificial agents to autonomously acquire a diverse array of skills necessary for behavioral adaptability in complex environments. While the autonomous learning of multiple tasks can be approached with various machine learning strategies, intrinsic motivations (IMs) have emerged as a crucial factor in driving self-governed exploration and skill acquisition. Intrinsic motivations provide task-agnostic reinforcement signals essential for guiding the development of competence across tasks.

The paper specifically deals with the issue of interdependent tasks in robotic settings, where mastering certain tasks might be contingent upon the successful execution of prerequisite tasks. Traditional hierarchical reinforcement learning approaches, although relevant, often do not fully embrace the autonomy required for such settings, especially when operating in non-discrete domains or relying on predefined external instructions. To overcome these limitations, the authors propose a system where task selection is treated as a Markov Decision Process (MDP), allowing for the strategic planning of skill acquisition to optimize overall competency development.

Approach

The authors present a reinforcement learning (RL) architecture designed for robot control that effectively learns multiple interdependent tasks through the framework of an MDP. The key innovation lies in treating task selection as an MDP, framing it as an optimization problem aimed at maximizing competence across goals rather than focusing solely on extrinsic rewards. This approach leverages a competence-based intrinsic motivation function, guiding the agent to allocate its learning resources to tasks promising the greatest competence gain.

The paper outlines a dual-layer system consisting of a goal selector and multiple task-specific actor-critic networks. The goal selector evaluates current and potential goal states and chooses tasks based on current competence improvement. This selection process is modeled as a multi-armed bandit problem with a focus on competence improvement, adapting to contextual cues.

Experimental Setup

The authors demonstrate their framework on a robotic platform using a simulated iCub robot. The iCub must learn to interact with objects in an environment where task feasibility depends on current environmental conditions or the successful completion of precursor tasks. The research incorporates two primary experimental settings:

  1. Contextual Bandit Setting: The robot must activate specific tasks contingent upon environmental features. Figure 1

    Figure 1: The simulated iCub robot in our experimental setup: when a sphere is touched (given its preconditions) it ``lights up'', changing its colour to green.

  2. MDP Setting with Interdependent Tasks: Tasks are interrelated such that completing one task is required before others can be attempted.

Results

The comparative analysis involves evaluating three systems: GRAIL (Goal-Related Autonomous Intrinsic Learning), C-GRAIL (Contextual-GRAIL), and M-GRAIL (Markovian-GRAIL). In the first experimental scenario evaluating task performance based on environmental states, C-GRAIL outperforms GRAIL due to its contextual awareness, successfully adapting task choices to environmental variability. Figure 2

Figure 2

Figure 2: Performance of GRAIL and C-GRAIL in the first experiment.

In scenarios with interdependent tasks, M-GRAIL demonstrates superior learning efficiency and task execution. The M-GRAIL system capitalizes on Q-learning to propagate intrinsic motivations across goal hierarchies, facilitating the systematic fulfillment of required preconditions for complex goals more effectively than C-GRAIL, which occasionally fails due to random task selection as intrinsic motivations diminish. Figure 3

Figure 3

Figure 3: Performance of C-GRAIL and M-GRAIL over the 6 goals in the second experiment.

Conclusion

This paper contributes to the field of autonomous robotics by advancing a novel methodology for handling open-ended, interdependent task learning through intrinsic motivation-driven, hierarchical RL approaches. The proposed system shows promise for applications in autonomous systems requiring adaptive task acquisition in dynamic environments, highlighting the potential for further research in refining competence evaluation and goal selection mechanisms. Future advancements may involve further integration of adaptive learning strategies and more sophisticated competence metrics to enhance the robustness and versatility of autonomous robotic systems in real-world applications.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.