Semantic Segmentation of Underwater Imagery: Dataset and Benchmark (2004.01241v3)

Published 2 Apr 2020 in cs.CV and eess.IV

Abstract: In this paper, we present the first large-scale dataset for semantic Segmentation of Underwater IMagery (SUIM). It contains over 1500 images with pixel annotations for eight object categories: fish (vertebrates), reefs (invertebrates), aquatic plants, wrecks/ruins, human divers, robots, and sea-floor. The images have been rigorously collected during oceanic explorations and human-robot collaborative experiments, and annotated by human participants. We also present a benchmark evaluation of state-of-the-art semantic segmentation approaches based on standard performance metrics. In addition, we present SUIM-Net, a fully-convolutional encoder-decoder model that balances the trade-off between performance and computational efficiency. It offers competitive performance while ensuring fast end-to-end inference, which is essential for its use in the autonomy pipeline of visually-guided underwater robots. In particular, we demonstrate its usability benefits for visual servoing, saliency prediction, and detailed scene understanding. With a variety of use cases, the proposed model and benchmark dataset open up promising opportunities for future research in underwater robot vision.

Authors (8)

Md Jahidul Islam (36 papers)
Chelsey Edge (6 papers)
Yuyang Xiao (4 papers)
Peigen Luo (3 papers)
Muntaqim Mehtaz (1 paper)
Christopher Morse (3 papers)
Sadman Sakib Enan (7 papers)
Junaed Sattar (47 papers)

Citations (169)

View on Semantic Scholar

Summary

The paper presents the SUIM dataset and introduces SUIM-Net, advancing semantic segmentation in underwater imagery.
It benchmarks state-of-the-art models like UNet and DeepLab using metrics such as mean IOU and F-scores for accurate scene analysis.
This work enhances real-time visual interpretation for underwater robots and sets the stage for future research in autonomous exploration.

Semantic Segmentation of Underwater Imagery: Dataset and Benchmark Analysis

The paper "Semantic Segmentation of Underwater Imagery: Dataset and Benchmark" presents the SUIM dataset, a comprehensive collection of over 1500 annotated underwater images, along with the development of SUIM-Net, an efficient semantic segmentation model. Designed for underwater scenes, the SUIM dataset provides annotations for eight specific object categories crucial to underwater exploration, including fish, reefs, aquatic plants, wrecks, human divers, robots, and sea floor. This dataset allows researchers to enhance the capabilities of visually-guided underwater robots, which require precise scene understanding.

Dataset and Objectives

The SUIM dataset fills a gap in the availability of large-scale annotated underwater imagery for semantic segmentation. The images captured during oceanic explorations and human-robot collaborative experiments aim to facilitate multiple underwater applications, such as exploration, surveying, and human-robot cooperation. Pixel-level annotations encompass a broader set of object categories than previously addressed in literature, thereby promoting research beyond application-specific solutions like coral reef classification or fish detection.

Benchmark and Model Evaluation

A benchmark evaluation is conducted on state-of-the-art (SOTA) semantic segmentation approaches using metrics that assess region similarity and object boundary localization accuracy, specifically $\mathcal{F}$ scores and mean IOU scores. Notably, models like UNet and DeepLab exhibit superior performance, demonstrating their aptitude in handling complex scenes prevalent in underwater environments.

Additionally, SUIM-Net steps forward as an efficient model delivering competitive segmentation performance while ensuring fast inference—the latter being indispensable for real-time operations in autonomous underwater robots. SUIM-Net leverages a fully-convolutional encoder-decoder design with skip connections, embodying residual learning to enable robust interaction among model layers. Two variants of this model are proposed, with SUIM-Net $_{RSB}$ prioritizing inference speed and SUIM-Net $_{VGG}$ focusing on enhanced performance.

Implications and Future Research Directions

The practical implications of this research are considerable, establishing a foundation for future advancements in underwater robot vision. The SUIM dataset and SUIM-Net provide crucial resources for robust visual attention modeling and semantic saliency estimation tasks. These developments hold promise for a range of applications, notably in autonomous operation scenarios where understanding spatial dynamics between multiple entities is vital.

Looking ahead, further research is encouraged to explore capabilities in visual question answering and spatio-temporal search within underwater environments. These areas remain relatively untapped and offer exciting avenues to advance autonomous underwater systems.

Final Remarks

The contribution of this paper is notable in formalizing a dataset and model that support detailed scene understanding in underwater contexts. By facilitating benchmark evaluations and introducing SUIM-Net, the paper offers essential tools for progressing visual perception research in aquatic robotics. Future explorations using SUIM will likely expand the theoretical and applied knowledge, significantly impacting the automation of underwater exploration and cooperation missions.