Vision-based Situational Graphs Exploiting Fiducial Markers for the Integration of Semantic Entities

Published 19 Sep 2023 in cs.RO | (2309.10461v2)

Abstract: Situational Graphs (S-Graphs) merge geometric models of the environment generated by Simultaneous Localization and Mapping (SLAM) approaches with 3D scene graphs into a multi-layered jointly optimizable factor graph. As an advantage, S-Graphs not only offer a more comprehensive robotic situational awareness by combining geometric maps with diverse hierarchically organized semantic entities and their topological relationships within one graph, but they also lead to improved performance of localization and mapping on the SLAM level by exploiting semantic information. In this paper, we introduce a vision-based version of S-Graphs where a conventional \ac{VSLAM} system is used for low-level feature tracking and mapping. In addition, the framework exploits the potential of fiducial markers (both visible as well as our recently introduced transparent or fully invisible markers) to encode comprehensive information about environments and the objects within them. The markers aid in identifying and mapping structural-level semantic entities, including walls and doors in the environment, with reliable poses in the global reference, subsequently establishing meaningful associations with higher-level entities, including corridors and rooms. However, in addition to including semantic entities, the semantic and geometric constraints imposed by the fiducial markers are also utilized to improve the reconstructed map's quality and reduce localization errors. Experimental results on a real-world dataset collected using legged robots show that our framework excels in crafting a richer, multi-layered hierarchical map and enhances robot pose accuracy at the same time.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (2)

View on Semantic Scholar

Summary

The paper introduces a novel SLAM system that integrates fiducial markers to generate three-layered, optimizable situational graphs for enriched 3D mapping.
The methodology enhances map accuracy by imposing semantic constraints from detected entities such as walls, doors, corridors, and rooms.
Empirical evaluations demonstrate reduced trajectory error and improved robustness compared to ORB-SLAM 3.0 and LiDAR-based frameworks.

Overview of "Vision-based Situational Graphs Generating Optimizable 3D Scene Representations"

The paper "Vision-based Situational Graphs Generating Optimizable 3D Scene Representations" presents an innovative framework aimed at enhancing visual SLAM (Simultaneous Localization and Mapping) by leveraging RGB-D cameras and fiducial markers to enrich environmental maps with semantic information. This enriching process results in three-layered, hierarchically-structured situational graphs which optimize the understanding and accuracy of 3D scenes.

Framework and Principle Contributions

This work builds upon ORB-SLAM 3.0 and incorporates fiducial markers to introduce additional constraints that help form a richer, multi-layered representation of environments. The framework includes capturing and processing visual data to detect and integrate semantic elements such as walls, doorways, corridors, and rooms. Fiducial markers serve a dual purpose here: they enhance tracking capabilities and provide crucial semantic references that transform purely geometric maps into semantically-rich graphs.

The paper's novel contributions are enumerated as follows:

Marker-based Framework: A novel methodology for employing fiducial markers within an RGB-D supported visual sensor framework to generate optimizable situational graphs.
Semantic Extraction and Mapping: A process introducing semantic constraints through wall, door, corridor, and room detection, significantly ameliorating map quality and localization accuracy.
Enhanced Optimization: The introduction of a tight coupling between robot poses and hierarchical representations within a single graph structure, drawing comparison and improvement upon existing LiDAR-based frameworks.
Imperceptible Fiducial Markers: Exploration of using imperceptible markers in SLAM applications, providing insights for decreasing environmental visual pollution while maintaining robust map generation.

Method and Evaluation

The method relies on established geometric SLAM techniques, enhanced with specific additional optimizations for incorporating semantic data into the mapping process. The key innovation is utilizing the semantic perception module to assign relational constraints among identified entities, forming a coherent and optimizable situational graph structure.

Empirical validation is conducted using a dataset obtained from legged robotics platforms traversing various indoor scenarios. The experimental results reveal that when compared to existing SLAM methodologies such as UcoSLAM and ORB-SLAM 3.0, this framework achieves significant reductions in trajectory error (evaluated via RMSE and STD metrics), highlighting its robustness and accuracy. Particularly, it demonstrates superior performance and robustness in complex environments, with benefits manifest in both the quantitative metrics and the qualitative semantic map outputs.

The paper also preliminarily investigates the potential of imperceptible fiducial markers, suggesting a direction for future research in maintaining aesthetic harmony in environments while utilizing visual markers for navigation and mapping.

Implications and Future Work

This paper contributes significantly to the field of robotic navigation and environmental understanding by elevating the semantic depth and accuracy of visual SLAM systems. The practical implications are notably relevant for robotics platforms operating in dynamic or semantically complex indoor environments where traditional SLAM systems might struggle.

Theoretical implications are equally profound, as this framework proposes an effective integration of topological constraints with semantic information. This approach enhances the richness of data representation and opens pathways for further exploration into advanced perception systems, potentially benefiting areas such as indoor robot navigation, autonomous driving, and augmented reality.

Future work could involve expanding support for different sensor types, improving computational efficiency, and utilizing these semantic graphs for higher-order reasoning tasks. It may also include further studies into invisible markers and their integration into existing SLAM frameworks, offering promising insights into unobtrusive marker design and deployment.

This paper sets a robust foundation for ongoing exploration into vision-based semantic mapping, positioning itself as a valuable resource within the domain of contemporary computer vision and robotics research.

Markdown Report Issue