Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion (2207.10494v2)

Published 21 Jul 2022 in cs.CV, cs.RO, and eess.SP

Abstract: Event cameras are bio-inspired sensors that offer advantages over traditional cameras. They operate asynchronously, sampling the scene at microsecond resolution and producing a stream of brightness changes. This unconventional output has sparked novel computer vision methods to unlock the camera's potential. Here, the problem of event-based stereo 3D reconstruction for SLAM is considered. Most event-based stereo methods attempt to exploit the high temporal resolution of the camera and the simultaneity of events across cameras to establish matches and estimate depth. By contrast, this work investigates how to estimate depth without explicit data association by fusing Disparity Space Images (DSIs) originated in efficient monocular methods. Fusion theory is developed and applied to design multi-camera 3D reconstruction algorithms that produce state-of-the-art results, as confirmed by comparisons with four baseline methods and tests on a variety of available datasets.

Citations (25)

View on Semantic Scholar

Summary

The paper introduces a fusion architecture that aligns Disparity Space Images across multiple event cameras to improve 3D reconstruction.
It combines temporal fusion and adaptive Gaussian thresholding to effectively filter out outliers while boosting depth accuracy.
Empirical evaluations demonstrate state-of-the-art performance, paving the way for advanced event-based visual SLAM systems.

Multi-Event-Camera Depth Estimation: An Overview

The paper "Multi-Event-Camera Depth Estimation and Outlier Rejection by Refocused Events Fusion" introduces a novel approach to 3D reconstruction using event cameras. Event cameras, distinct from traditional frame-based cameras, capture changes in brightness asynchronously, producing events representative of brightness fluctuations at microsecond intervals. This unique data stream necessitates innovative methodologies to unlock the latent potential of such cameras, particularly in stereo 3D reconstruction for Simultaneous Localization and Mapping (SLAM).

Key Contributions and Methodology

The presented approach diverges from conventional event-based stereo methods that prioritize high temporal resolution and simultaneity of events to establish data associations and depth estimation. Instead, this work proposes an event fusion strategy leveraging Disparity Space Images (DSIs) originating from monocular methods for robust multi-camera 3D reconstruction.

Core Contributions:

Fusion Architecture: The authors develop a fusion theory to design stereo algorithms that integrate data from multiple event cameras without the need for explicit event matching. The core idea is to align DSIs across synchronized cameras and apply fusion functions like harmonic means to emphasize regions of high ray density from multiple views.
Temporal and Camera Fusion: The method extends to incorporate temporal fusion, where various sub-intervals of event data are combined, further strengthening the reconstruction capability by capitalizing on both spatial and temporal redundancies.
Combination with Artifacts Reduction: By employing techniques such as adaptive Gaussian thresholding on the confidence maps generated from fused DSIs, the method effectively filters out outliers, enhancing the reliability and accuracy of the depth maps.

Robust Experimental Evaluation

The proposed method's efficacy is substantiated through rigorous evaluations on diverse datasets, including those captured with different event cameras and under varying conditions. The empirical results indicate that the method yields state-of-the-art performance, outperforming existing approaches in depth accuracy and outlier rejection, promising significant advancements in event-based vision systems.

Broader Implications and Future Directions

This research marks a crucial step forward in event-based vision, particularly for autonomous systems operating in challenging environments where high-speed and HDR capabilities of event cameras are advantageous. By transcending the limitations of traditional stereo algorithms, the method showcases the potential for future systems that require real-time, accurate depth estimation.

Theoretical and Practical Implications:

The work lays a foundational framework for future AI systems to leverage event-based cameras for tasks beyond simple visual odometry, extending into areas like dynamic scene understanding and navigation in low-light scenarios.
The method's adaptability to high-resolution event data and robustness to calibration errors paves the way for applications in advanced robotic systems and beyond.
Future research could aim to integrate this reconstruction framework with robust tracking algorithms to develop a full-fledged event-based visual SLAM pipeline, optimizing system robustness and performance in real-world scenarios.

Conclusion:

The fusion method introduced represents a meaningful evolution in stereo depth estimation using event cameras. As these sensors continue to evolve, augmenting spatial and temporal resolution while reducing power consumption, approaches such as the one developed in this paper will undeniably play a pivotal role in the growing landscape of intelligent autonomous systems.

PDF Markdown

Related Papers

YouTube

Show All Videos