Emergent Mind

Deep Event Visual Odometry

(2312.09800)
Published Dec 15, 2023 in cs.CV and cs.RO

Abstract

Event cameras offer the exciting possibility of tracking the camera's pose during high-speed motion and in adverse lighting conditions. Despite this promise, existing event-based monocular visual odometry (VO) approaches demonstrate limited performance on recent benchmarks. To address this limitation, some methods resort to additional sensors such as IMUs, stereo event cameras, or frame-based cameras. Nonetheless, these additional sensors limit the application of event cameras in real-world devices since they increase cost and complicate system requirements. Moreover, relying on a frame-based camera makes the system susceptible to motion blur and HDR. To remove the dependency on additional sensors and to push the limits of using only a single event camera, we present Deep Event VO (DEVO), the first monocular event-only system with strong performance on a large number of real-world benchmarks. DEVO sparsely tracks selected event patches over time. A key component of DEVO is a novel deep patch selection mechanism tailored to event data. We significantly decrease the pose tracking error on seven real-world benchmarks by up to 97% compared to event-only methods and often surpass or are close to stereo or inertial methods. Code is available at https://github.com/tum-vision/DEVO

Overview

  • Event cameras capture pixel-level changes with high temporal resolution and range, aiding in visual odometry (VO) tasks.

  • Current event-based VO systems often need additional sensors to perform well, but these can increase costs and introduce issues like motion blur.

  • The paper introduces Deep Event Visual Odometry (DEVO), a monocular event-only VO system using a deep learning approach for improved pose tracking.

  • DEVO's architecture is trained on simulated data and evaluated on real-world benchmarks, demonstrating superior accuracy and robustness.

  • DEVO's code and datasets are made open-source, promoting further research in event-based VO without reliance on additional sensory inputs.

Introduction

Event cameras are sensors that differ significantly from traditional cameras, capturing pixel-level brightness changes asynchronously with high temporal resolution and dynamic range. These unique attributes open up promising avenues for tracking camera movement, known as visual odometry (VO), especially during high-speed motion or in challenging lighting conditions. However, the current landscape of event-based VO approaches, even though robust in adverse conditions, still struggles with performance limitations and often relies on additional sensory inputs like inertial measurement units (IMUs), stereo vision, or frame-based cameras to achieve satisfactory results. These additional inputs complicate systems, increase costs, and expose them to issues such as motion blur and reduced dynamic range.

Towards Monocular Event-Only Visual Odometry

To address these challenges, this research introduces Deep Event Visual Odometry (DEVO), a monocular, event-only VO system designed to work robustly across various real-world benchmarks. DEVO innovatively selects and tracks certain event "patches" over time, using a deep-learning approach for patch selection optimized for event data. This method enables DEVO to reduce pose tracking error substantially on several real-world benchmarks, often rivaling or surpassing the performance of stereo or inertial methods without the need for additional sensors.

Deep Event Visual Odometry (DEVO)

DEVO associates visual events with relevant patches for tracking, leveraging a tailored neural network that predicts which areas of the data are most promising for accurate VO. The system estimates camera poses and depths from sequences of event data using an iterative process that refines optical flow predictions and adjusts the event patches' trajectories. A significant part of DEVO is its patch selection network, developed specifically for the sparse and temporally-rich nature of event camera data.

The network architecture is trained on a large simulated dataset and evaluated against multiple real-world benchmarks, surpassing previous event-based methods in accuracy and robustness. Additionally, DEVO integrates photometric voxel augmentations, compensating for the sim-to-real gap that arises from the idealized event generation models used in simulation.

Evaluation and Open-Source Contributions

The evaluation of DEVO across seven real-world benchmarks highlights its capacity to generalize from simulation to a diverse array of real-world conditions without extensive parameter tuning. The evaluation reveals that DEVO often outperforms comparable methods that use additional sensors like IMUs or stereo cameras, demonstrating the efficacy of learning-based VO with event data.

To contribute to the research community and encourage further advancements in event-based vision, the authors of the paper have made their code, including training, evaluation, and event data generation, accessible as open-source. This transparency ensures that future research can build upon their findings, optimizing and developing this innovative approach to visual odometry further.

Create an account to read this summary for free:

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.