Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning (2105.00957v2)

Published 3 May 2021 in cs.CV

Abstract: Weakly supervised segmentation requires assigning a label to every pixel based on training instances with partial annotations such as image-level tags, object bounding boxes, labeled points and scribbles. This task is challenging, as coarse annotations (tags, boxes) lack precise pixel localization whereas sparse annotations (points, scribbles) lack broad region coverage. Existing methods tackle these two types of weak supervision differently: Class activation maps are used to localize coarse labels and iteratively refine the segmentation model, whereas conditional random fields are used to propagate sparse labels to the entire image. We formulate weakly supervised segmentation as a semi-supervised metric learning problem, where pixels of the same (different) semantics need to be mapped to the same (distinctive) features. We propose 4 types of contrastive relationships between pixels and segments in the feature space, capturing low-level image similarity, semantic annotation, co-occurrence, and feature affinity They act as priors; the pixel-wise feature can be learned from training images with any partial annotations in a data-driven fashion. In particular, unlabeled pixels in training images participate not only in data-driven grouping within each image, but also in discriminative feature learning within and across images. We deliver a universal weakly supervised segmenter with significant gains on Pascal VOC and DensePose. Our code is publicly available at https://github.com/twke18/SPML.

Authors (3)

Tsung-Wei Ke (10 papers)
Jyh-Jing Hwang (13 papers)
Stella X. Yu (65 papers)

Citations (71)

View on Semantic Scholar

Summary

The paper introduces SPML, a novel pixel-to-segment contrastive framework that learns four types of relationships to improve weakly supervised segmentation.
It achieves significant performance gains, reaching up to 74.2–76.1% mIoU on Pascal VOC and 77.1% on DensePose in setups with sparse annotations.
The method simplifies training by effectively using partial annotations, offering a scalable solution for complex real-world segmentation tasks.

Universal Weakly Supervised Segmentation by Pixel-to-Segment Contrastive Learning

The paper presents a comprehensive approach for weakly supervised semantic segmentation through a novel framework known as Semi-supervised Pixel-wise Metric Learning (SPML). This method addresses the challenge of weakly supervised segmentation, where precise pixel annotations are not available and relies on partial annotations such as image-level tags, bounding boxes, labeled points, and scribbles. The authors propose a unified pixel-to-segment contrastive learning model that significantly advances the flexibility and efficacy of semantic segmentation models, especially under various weak supervision conditions.

Methodology Overview

The framework is based on the concept of contrastive learning in metric spaces, where four types of contrastive relationships between pixels and segments are explored:

Low-level image similarity: This involves grouping pixels based on coherent visual regions, using segmentation methods that capture local mixing despite heterogeneous semantics.
Semantic annotation: Here, semantic information is propagated from labeled points or scribbles to pseudo labels inferred from coarse image- or box-wise CAM.
Semantic co-occurrence: This relationship is based on the premise that segments appearing in images with common semantic classes ought to be considered alike.
Feature affinity: This is predicated on the assumption that features within the semantic context remain consistent across images, using clustering in the learned feature space to achieve smoother image segmentations.

Performance Evaluation

The implementation of SPML yields substantial gains across standard datasets like Pascal VOC and DensePose compared to state-of-the-art (SOTA) methods. A detailed comparison indicates SPML achieves performance gains of up to 24.7% in scenarios with the sparsest annotations, such as points and scribbles. In terms of specific benchmarks:

Pascal VOC: Using image tags, bounding boxes, or scribbles, SPML consistently outperforms previous methods, achieving up to 74.2% mIoU in validation and 76.1% in test datasets without CRF post-processing, and even higher numbers with it.
DensePose: With point annotations, SPML reaches 77.1% effectiveness, showing notable improvements over the baseline by 12.9%.

Implications and Future Directions

The SPML framework posits practical applicability in scenarios where fine-grained annotations are infeasible due to resource constraints, thereby simplifying the training process. The use of contrastive learning for segmenting operations introduces a robust mechanism that potentially improves the adaptive segmentation processes within and across images, making it suitable for diverse real-world applications.

Future research directions might explore further refinement of the contrastive learning process, possibly amalgamating additional prior knowledge or alternative forms of weak supervision to enhance segmentation robustness and accuracy. Moreover, the SPML approach's universal applicability suggests potential expansion into broader tasks in computer vision, especially where partial annotations prevail.

In summary, SPML marks a significant advancement in the area of weakly supervised semantic segmentation, offering a compelling approach that excels across varied annotation landscapes. Its adaptability and efficacy emphasize its role not only as a methodological milestone but also its promise for practical implementation in complex visual tasks.

PDF Markdown

Related Papers

YouTube

Show All Videos