Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
157 tokens/sec
GPT-4o
8 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ACFNet: Attentional Class Feature Network for Semantic Segmentation (1909.09408v3)

Published 20 Sep 2019 in cs.CV

Abstract: Recent works have made great progress in semantic segmentation by exploiting richer context, most of which are designed from a spatial perspective. In contrast to previous works, we present the concept of class center which extracts the global context from a categorical perspective. This class-level context describes the overall representation of each class in an image. We further propose a novel module, named Attentional Class Feature (ACF) module, to calculate and adaptively combine different class centers according to each pixel. Based on the ACF module, we introduce a coarse-to-fine segmentation network, called Attentional Class Feature Network (ACFNet), which can be composed of an ACF module and any off-the-shell segmentation network (base network). In this paper, we use two types of base networks to evaluate the effectiveness of ACFNet. We achieve new state-of-the-art performance of 81.85% mIoU on Cityscapes dataset with only finely annotated data used for training.

Citations (254)

Summary

  • The paper introduces a novel Attentional Class Feature module that leverages class centers to capture class-level context for improved semantic segmentation.
  • It employs a Class Center Block and a Class Attention Block to integrate coarse segmentation outputs with high-level features, setting a new benchmark on Cityscapes.
  • The methodology refines intra-class feature consistency and suggests potential for advancing pixel-level prediction tasks in complex scenes.

ACFNet: Attentional Class Feature Network for Semantic Segmentation

The paper presents ACFNet, an Attentional Class Feature Network designed to enhance semantic segmentation by incorporating a novel approach to capturing contextual information. Unlike traditional methods focusing on spatial contexts, this work exploits class-level context through the introduction of the class center, which represents the aggregate feature of each category within an image. The methodological innovation lies in the development of the Attentional Class Feature (ACF) module, which adaptively integrates class centers with respect to individual pixel features, thereby refining semantic segmentation results.

The core concept of ACFNet is the class center, which captures the global context of each category present in an image by aggregating the features of all pixels belonging to that category. This approach is juxtaposed with spatial context strategies, such as the Pyramid Pooling Module and Atrous Spatial Pyramid Pooling, which sample spatial regions independently of class membership. ACFNet's strategy circumvents potential confusions that arise when pixels from different classes influence the context uniformly, as in traditional methods.

The ACF module comprises two main components: the Class Center Block (CCB) and the Class Attention Block (CAB). The CCB approximates class centers during the test phase by leveraging high-level feature maps and coarse segmentation outputs. This obviates the need for ground truth labels at test time. Subsequently, the CAB utilizes these approximated class centers alongside coarse segmentation results to form attentional class features, making it possible for each pixel to attend selectively to relevant class information.

The proposed ACFNet is evaluated using the Cityscapes dataset, achieving a mean Intersection over Union (mIoU) of 81.85%. This performance sets a new benchmark, using only finely annotated training data. Through a series of ablation studies, the authors demonstrate the effectiveness of both the class center concept and the attentional combination of class centers for enhancing segmentation performance. Compared to more traditional approaches that do not discern class-specific contexts, ACFNet significantly improves intra-class feature consistency and overall segmentation accuracy. Visualization of feature similarities further corroborates these quantitative findings.

The implications of ACFNet are profound, suggesting a shift in how context is utilized for semantic segmentation. By incorporating class-level information, this method enables more nuanced segmentation, particularly in scenes with complex interactions between different object categories. The potential for integrating such class-aware features extends beyond semantic segmentation, possibly benefitting other pixel-level prediction tasks in computer vision.

Looking forward, the incorporation of this categorical awareness could drive further advancements in segmentation models, paving the way for more robust and contextually aware AI systems. Future developments might focus on refining the ACF module and exploring its integration with other state-of-the-art network architectures to push the boundaries of performance on even more challenging datasets.