Contact Energy Based Hindsight Experience Prioritization (2312.02677v2)

Published 5 Dec 2023 in cs.RO and cs.AI

Abstract: Multi-goal robot manipulation tasks with sparse rewards are difficult for reinforcement learning (RL) algorithms due to the inefficiency in collecting successful experiences. Recent algorithms such as Hindsight Experience Replay (HER) expedite learning by taking advantage of failed trajectories and replacing the desired goal with one of the achieved states so that any failed trajectory can be utilized as a contribution to learning. However, HER uniformly chooses failed trajectories, without taking into account which ones might be the most valuable for learning. In this paper, we address this problem and propose a novel approach Contact Energy Based Prioritization~(CEBP) to select the samples from the replay buffer based on rich information due to contact, leveraging the touch sensors in the gripper of the robot and object displacement. Our prioritization scheme favors sampling of contact-rich experiences, which are arguably the ones providing the largest amount of information. We evaluate our proposed approach on various sparse reward robotic tasks and compare them with the state-of-the-art methods. We show that our method surpasses or performs on par with those methods on robot manipulation tasks. Finally, we deploy the trained policy from our method to a real Franka robot for a pick-and-place task. We observe that the robot can solve the task successfully. The videos and code are publicly available at: https://erdiphd.github.io/HER_force

References (32)

Citations (1)

View on Semantic Scholar

Summary

The paper presents a novel replay buffer prioritization scheme that uses continuous contact energy from tactile sensors to enhance RL in sparse reward settings.
It integrates sensor feedback and object displacement to compute nuanced contact energy, accelerating policy convergence in both simulation and real-world tests.
Experimental results demonstrate superior performance and robustness across pick-and-place, push, and slide tasks, outperforming traditional methods.

Contact Energy Based Hindsight Experience Prioritization

Introduction

The paper "Contact Energy Based Hindsight Experience Prioritization" introduces a novel approach to enhance the performance of reinforcement learning (RL) for robot manipulation tasks with sparse rewards. Traditional RL algorithms often struggle with sparse rewards due to the rarity of successful experiences, which hampers efficient learning. Hindsight Experience Replay (HER) partly addresses this by reinterpreting failures to contribute useful learning experiences. However, it uniformly samples these failed trajectories, failing to account for their varying utility in learning. This paper proposes Contact Energy Based Prioritization (CEBP), which selectively prioritizes contact-rich experiences, leveraging touch sensors in robot grippers to enhance learning efficiency.

Methodology

CEBP enhances HER by integrating tactile feedback and object displacement to prioritize experiences. This approach hinges on the insight that contact-rich interactions, measured through touch sensors on the robot’s gripper, provide substantial information valuable for learning. The framework computes "contact energy" by summing contact forces and multiplying them by the Euclidean norm of object displacement. This contact energy is then smoothed using a sigmoid function, instead of reducing contact forces to discrete values as seen in CPER, thus retaining a rich, continuous range of information.

The main contributions of the approach include:

Incorporating touch sensors to rely solely on a binary sparse reward signal.
Introducing a novel replay buffer prioritization scheme that uses contact energy to prioritize rich experiences.
Demonstration of the proposed method in various robotic manipulation tasks within a simulation and transferring the policy to a real-world robot setup.

Performance is evaluated on three standard tasks: FetchPickAndPlace, FetchPush, and FetchSlide, all conducted in the MuJoCo simulation environment, further validating through Sim2Real transfer using a Franka robot.

Experimental Results

Experimental results indicate that CEBP consistently outperforms existing benchmarks in terms of learning efficiency and success rate, particularly in pick-and-place and push tasks. The novel prioritization method converges to a near-optimal policy markedly faster than competing methods. In the push task, for instance, the proposed method outperformed others rapidly in the early stages of training. Although CPER achieved faster initial learning in the slide task, CEBP matched its performance by the end with lower variance, indicating robustness.

A notable insight from the experiments is the utility of continuous contact force values over discrete thresholding in CPER. This continuous range facilitates a more nuanced learning process, as confirmed in the ablation paper investigating the impact of the sigmoid function's temperature parameter on learning performance.

Theoretical and Practical Implications

Theoretical implications of CEBP include a refined approach to experience prioritization in RL that better harnesses tactile feedback. Practically, this method promises more efficient training regimes for robots in real-world settings, reducing the reliance on extensive reward engineering. The successful application in a real-world Franka robot underscores its practical viability, although further work is needed to tackle challenges like occlusion in push tasks and simulation-reality friction discrepancies in slide tasks.

Future Directions

Further research might explore:

Extended Applications: Adapting CEBP to other RL domains and tasks beyond robotic manipulation.
Meta-RL Integration: Incorporating sparse rewards and hindsight experience priorities into meta-reinforcement learning frameworks could potentiate scalable, fast-adapting RL agents.
Robust Sim2Real Transfer: Enhancing the robustness of policy transfer from simulation to reality with multiple camera perspectives and refined modeling of environmental parameters.

Conclusion

The Contact Energy Based Hindsight Experience Prioritization provides a significant step towards more efficient learning in robotic manipulation, leveraging tactile sensing and object displacement. By incorporating nuanced prioritization of replay buffer experiences, CEBP enhances the learning process for RL agents, paving the way for more adaptable and proficient robotic applications.

Related Papers

GitHub

Force-Based Hindsight Experience Prioritization