Emergent Mind

Imitating Interactive Intelligence

(2012.05672)
Published Dec 10, 2020 in cs.LG , cs.AI , and cs.MA

Abstract

A common vision from science fiction is that robots will one day inhabit our physical spaces, sense the world as we do, assist our physical labours, and communicate with us through natural language. Here we study how to design artificial agents that can interact naturally with humans using the simplification of a virtual environment. This setting nevertheless integrates a number of the central challenges of AI research: complex visual perception and goal-directed physical control, grounded language comprehension and production, and multi-agent social interaction. To build agents that can robustly interact with humans, we would ideally train them while they interact with humans. However, this is presently impractical. Therefore, we approximate the role of the human with another learned agent, and use ideas from inverse reinforcement learning to reduce the disparities between human-human and agent-agent interactive behaviour. Rigorously evaluating our agents poses a great challenge, so we develop a variety of behavioural tests, including evaluation by humans who watch videos of agents or interact directly with them. These evaluations convincingly demonstrate that interactive training and auxiliary losses improve agent behaviour beyond what is achieved by supervised learning of actions alone. Further, we demonstrate that agent capabilities generalise beyond literal experiences in the dataset. Finally, we train evaluation models whose ratings of agents agree well with human judgement, thus permitting the evaluation of new agent models without additional effort. Taken together, our results in this virtual environment provide evidence that large-scale human behavioural imitation is a promising tool to create intelligent, interactive agents, and the challenge of reliably evaluating such agents is possible to surmount.

Overview

  • The paper discusses creating intelligent agents capable of interacting with humans and environments by integrating AI research areas such as perception, motor control, and language processing.

  • A virtual 'Playroom' is used to simulate interactions for the purpose of collecting human behavior data, crucial for agent training and evaluation.

  • Agents are trained using behavioral cloning and supplementary strategies like Language Matching, Object-in-View Auxiliary Losses, and Generative Adversarial Imitation Learning.

  • Training occurs in both multi-player interactive and setter replay environments to prepare agents for real-time human interaction.

  • The paper evaluates agents through automated metrics, scripted probe tasks, and human annotations, and looks at scaling and transfer learning with controlled experiments.

Overview of Agent Development and Training

Building intelligent artificial agents that can interact with humans and their environment requires integrating multiple aspects of AI research such as visual perception, motor control, natural language processing, and social interaction. In pursuit of this integration, efforts have been directed towards constructing a virtual environment, termed the "Playroom", populated with diverse objects and governed by the laws of physics. This controllable environment has been instrumental in simulating interactions and collecting large datasets of human behavior, which are essential for training and evaluating artificial agents.

Key Strategies in Agent Training

The training process for these agents is rooted in imitation learning, more specifically, a method called behavioral cloning (BC), where the agent learns by mimicking expert human behaviors captured in the dataset. An essential challenge in this training method is to make the agent's actions less sparse and more discriminative, so it responds contextually appropriate to visual inputs and verbal instructions. To overcome this issue, two main strategies have been employed:

  1. Language Matching (LM) and Object-in-View (OV) Auxiliary Losses: By using LM and OV tasks, the agents are trained in a supervised manner to align language with vision and identify objects based on expert human behavior, fostering better object recognition and grounding of language in visual perception.

  2. Generative Adversarial Imitation Learning (GAIL): GAIL involves training a discriminator to distinguish between expert human and agent trajectories, converting its output into a reward signal for the agent. The agent then learns via reinforcement learning to generate behavior that the discriminator judges as human-like.

Interactive Training Environment

Considering the complexity of interactive training, where an agent's behavior must adapt based on a human's real-time feedback, the agents have been trained in both a multi-player interactive environment and a setter replay environment. Some episodes feature pre-recorded human-setter trajectories, providing consistency in the instructions for agents acting in the solver role. This mixed approach aims to build a bridge towards agents capable of engaging with live humans in an interactive manner.

Evaluation Methodologies

The effectiveness of agent behavior and the success of training methods are measured using both automated metrics and human judgment:

  • Automated Evaluation Metrics: Metrics such as the first-object-lifted and object-mention-error-rate evaluate how well agents follow instructions and refer to actual objects in the virtual space.

  • Scripted Probe Tasks: Agents have also been evaluated through procedurally-generated tasks that benchmark their abilities to follow simple instructions and answer questions, allowing for quantitative performance measurement.

  • Human Annotations: Human raters annotate pre-recorded interactions between agents and their environment, or between agents and humans, providing insights into the agents' capacity to generate contextually relevant language and actions.

Scaling and Transfer Learning Experiments

To study how the performance of agents scales with data and their ability to transfer learned behavior to new tasks, controlled experiments have been conducted. These include training on multiple tasks to determine if multitask learning leads to data-efficient learning for a new task and removing certain object-color combinations from the dataset to test color-object generalization.

Outlook and Future Directions

The integration of perception, control, and language through large-scale data-driven approaches has shown promising results in developing interactive agents. However, further research is necessary to refine agent behaviors beyond imitation to more sophisticated understanding and proactive assistance. Enhancements in knowledge representation, advanced credit assignment techniques, and augmentation of real-world datasets are some of the avenues being explored to achieve genuinely intelligent and versatile agents.

Note: The strategies, results, and methodologies outlined in this post will form the foundation for advancing research on interaction-capable artificial intelligence.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.