Where are the Blobs: Counting by Localization with Point Supervision (1807.09856v1)

Published 25 Jul 2018 in cs.CV

Abstract: Object counting is an important task in computer vision due to its growing demand in applications such as surveillance, traffic monitoring, and counting everyday objects. State-of-the-art methods use regression-based optimization where they explicitly learn to count the objects of interest. These often perform better than detection-based methods that need to learn the more difficult task of predicting the location, size, and shape of each object. However, we propose a detection-based method that does not need to estimate the size and shape of the objects and that outperforms regression-based methods. Our contributions are three-fold: (1) we propose a novel loss function that encourages the network to output a single blob per object instance using point-level annotations only; (2) we design two methods for splitting large predicted blobs between object instances; and (3) we show that our method achieves new state-of-the-art results on several challenging datasets including the Pascal VOC and the Penguins dataset. Our method even outperforms those that use stronger supervision such as depth features, multi-point annotations, and bounding-box labels.

Citations (188)

View on Semantic Scholar

Summary

The paper introduces a novel loss function combining image, point, split, and false-positive components to guide the network in producing one blob per object instance.
The authors propose two effective blob splitting methods, namely the line split and watershed split, to separate merged instances in crowded scenes.
The LC-FCN model outperforms regression-based approaches, achieving state-of-the-art performance on challenging datasets like Pascal VOC, Trancos, and Penguins with minimal supervision.

An Examination of "Where are the Blobs: Counting by Localization with Point Supervision"

This paper presents an innovative approach to the object counting problem in computer vision, categorically aligning itself within the detection-based methodologies while outperforming the more commonly superior regression-based models. The authors propose a novel method, introducing the Localization-based Counting Fully Convolutional Network (LC-FCN), which effectively utilizes a unique loss function geared towards outputting a single blob per object instance using only point-level supervision.

Key Contributions

In this research, the authors advance three main contributions to the field:

Novel Loss Function: The paper introduces a novel loss function specifically designed to steer the network towards generating a single blob for each object instance. This function primarily depends on point-level annotations, rather than extensive per-pixel or bounding-box labels common in detection-based models. The loss function comprises four components: an image-level loss, a point-level loss, a split-level loss, and a false positive loss. The image-level and point-level losses enforce semantic segmentation from point-level supervision, while the split-level loss guides the model in distinguishing overlapping instances. The false-positive loss mitigates the generation of blobs not corresponding to any ground-truth annotations.
Blobs Splitting Methods: The paper proposes two techniques for managing large blobs that erroneously encompass multiple object instances: the line split and the watershed split methods. The split-level loss and these splitting techniques collectively work to ensure each object instance corresponds to a distinct blob.
State-of-the-Art Performance: The proposed LC-FCN demonstrates superior performance across various challenging datasets, outclassing existing models that leverage stronger supervision. Specifically, it achieves new state-of-the-art results on datasets with notable complexity, such as Pascal VOC, Trancos, and Penguins.

Evaluation and Results

The LC-FCN's performance was evaluated against a diverse array of datasets, each presenting unique challenges pertaining to object appearance, occlusion, and density:

Penguins Dataset: LC-FCN achieved significantly better counting precision compared to depth-feature-utilizing models, suggesting robustness in heavily occluded scenarios.
Trancos Dataset: Showcased outstanding results, achieving improved Mean Absolute Error (MAE) and GAME measures, thus emphasizing its suitability for traffic monitoring tasks where vehicles appear markedly occluded and vary in size.
PASCAL VOC 2007: Achieved state-of-the-art counts despite using weaker point-level annotations, thereby illustrating effective utilization of minimal supervision relative to existing methods that rely on dense annotations.

Implications and Future Directions

The implications of this research underscore an important advancement in reducing the supervisory burdens associated with object detection and counting tasks. By leveraging point-level annotations, LC-FCN broadens the practical applications within cost-sensitive domains where labeling is a significant investment. The robustness and adaptability to varying object sizes and densities also pave the way for its integration into real-world applications, such as ecology, surveillance, and traffic analysis.

Future work could explore extending or combining this method with other advanced architectures, such as deeper networks or diversified forms of weak supervision, to enhance scalability and accuracy further. Additionally, refining the blob splitting heuristics might improve efficiency and effectiveness in dealing with varying object shapes and overlapping instances, which remains a challenging aspect of object counting.

In conclusion, "Where are the Blobs: Counting by Localization with Point Supervision" presents a significant step forward in detecting objects using minimal annotations, demonstrating both exceptional performance and promising scalability for multiple practical applications.

Related Papers

YouTube

Show All Videos