Precise Detection in Densely Packed Scenes (1904.00853v3)

Published 1 Apr 2019 in cs.CV

Abstract: Man-made scenes can be densely packed, containing numerous objects, often identical, positioned in close proximity. We show that precise object detection in such scenes remains a challenging frontier even for state-of-the-art object detectors. We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings. Our contributions include: (1) A layer for estimating the Jaccard index as a detection quality score; (2) a novel EM merging unit, which uses our quality scores to resolve detection overlap ambiguities; finally, (3) an extensive, annotated data set, SKU-110K, representing packed retail environments, released for training and testing under such extreme settings. Detection tests on SKU-110K and counting tests on the CARPK and PUCPR+ show our method to outperform existing state-of-the-art with substantial margins. The code and data will be made available on \url{www.github.com/eg4000/SKU110K_CVPR19}.

Authors (7)

Eran Goldman (2 papers)
Roei Herzig (34 papers)
Aviv Eisenschtat (2 papers)
Oria Ratzon (2 papers)
Itsik Levi (1 paper)
Jacob Goldberger (41 papers)
Tal Hassner (48 papers)

Citations (177)

View on Semantic Scholar

Summary

The paper introduces a Soft-IoU layer that refines overlap scoring between predicted and true bounding boxes in densely packed scenes.
It employs an EM-Merger unit using a Mixture of Gaussians to resolve ambiguities in overlapping detections.
The paper validates its approach on the SKU-110K dataset, achieving a 10% improvement in precision and enhanced accuracy in object counting tasks.

Precise Detection in Densely Packed Scenes

This paper introduces a novel method for detecting objects in densely packed scenes, which have historically challenged even state-of-the-art detection systems. The authors address this issue through a deep learning-based approach that introduces significant innovations in object detection, particularly for environments such as retail shelves, where many objects are positioned in close proximity and often appear similar or identical.

Key Contributions

Soft-IoU Layer: The paper introduces a Soft-IoU layer designed to estimate the Jaccard index between detected and ground truth bounding boxes. This layer enhances the traditional object/no-object confidence scores by providing a measure of overlap between predicted and true locations, which is crucial in crowded settings.
EM-Merger Unit: The authors employ an EM-based method to resolve ambiguities in overlapping detections. By representing detections as a Mixture of Gaussians (MoG), the approach clusters detections, thereby improving the resolution of individual object instances in tightly packed scenes.
SKU-110K Dataset: The research is bolstered by the introduction of a new, extensively annotated dataset, SKU-110K, featuring images of densely packed retail environments. This dataset is critical for training models to perform in such extreme settings and represents a significant step forward in benchmarking object detection in these scenarios.

Empirical Results

The proposed detection method demonstrates a notable improvement in performance over existing state-of-the-art object detectors when tested on SKU-110K. Particularly, the method achieves a 10% improvement in average precision at IoU=0.75 as compared to baseline methods. When applied to the object counting tasks on the CARPK and PUCPR+ datasets, the approach also surpasses recent methods designed specifically for counting, achieving lower MAE and RMSE scores.

Implications and Future Directions

The introduction of the SKU-110K dataset facilitates a deeper exploration of object detection in densely populated scenes, a context that was underrepresented in previous benchmarks. The proposed Soft-IoU layer and EM-Merger unit contribute to improved detection practices by addressing challenges endemic to these environments, such as overlapping objects and ambiguous detections.

Theoretical implications include the reconsideration of detection frameworks to incorporate overlap-sensitive measures like Soft-IoU, while practical implications involve the potential for improved automated retail and inventory systems, where accurate item detection is paramount.

Future work may delve into optimizing the EM-Merger for increased computational efficiency, potentially enhancing its run-time performance to match or surpass existing approaches. Furthermore, the evolving landscape of AI detection capabilities could see further integration and hybridization of spatially aware and overlap-sensitive detection mechanisms into broader AI systems across varied applications beyond retail, such as traffic surveillance and urban management.

In conclusion, the contributions of this paper not only address pressing challenges in object detection for densely packed environments but also set the stage for further advancements in AI-driven detection systems. The introduction of the SKU-110K dataset provides a pivotal resource for future research and development in this domain.

PDF Markdown