Amodal Ground Truth and Completion in the Wild (2312.17247v2)
Abstract: This paper studies amodal image segmentation: predicting entire object segmentation masks including both visible and invisible (occluded) parts. In previous work, the amodal segmentation ground truth on real images is usually predicted by manual annotaton and thus is subjective. In contrast, we use 3D data to establish an automatic pipeline to determine authentic ground truth amodal masks for partially occluded objects in real images. This pipeline is used to construct an amodal completion evaluation benchmark, MP3D-Amodal, consisting of a variety of object categories and labels. To better handle the amodal completion task in the wild, we explore two architecture variants: a two-stage model that first infers the occluder, followed by amodal mask completion; and a one-stage model that exploits the representation power of Stable Diffusion for amodal segmentation across many categories. Without bells and whistles, our method achieves a new state-of-the-art performance on Amodal segmentation datasets that cover a large variety of objects, including COCOA and our new MP3D-Amodal dataset. The dataset, model, and code are available at https://www.robots.ox.ac.uk/~vgg/research/amodal/.
- Cascade r-cnn: high quality object detection and instance segmentation. IEEE transactions on pattern analysis and machine intelligence (TPAMI), 43(5):1483–1498, 2019.
- Matterport3d: Learning from rgb-d data in indoor environments. International Conference on 3D Vision (3DV), 2017.
- Hybrid task cascade for instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 4974–4983, 2019.
- Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1290–1299, 2022.
- Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 5828–5839, 2017.
- Object-driven multi-layer scene decomposition from a single image. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
- The via annotation software for images, audio and video. In Proceedings of the 27th ACM international conference on multimedia (ACM MM), pages 2276–2279, 2019.
- SeGAN: Segmenting and generating the invisible. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6144–6153, 2018.
- Learning to see the invisible: End-to-end trainable amodal instance segmentation. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1328–1336. IEEE, 2019.
- Coarse-to-fine amodal segmentation with shape prior. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 1262–1271, 2023.
- Are we ready for autonomous driving? the kitti vision benchmark suite. In 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3354–3361. IEEE, 2012.
- Lvis: A dataset for large vocabulary instance segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 5356–5364, 2019.
- Mask r-cnn. In Proceedings of the IEEE international conference on computer vision (ICCV), pages 2961–2969, 2017.
- Detecting layered structures of partially occluded objects for bin picking. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5786–5791. IEEE, 2019.
- Learning category-specific mesh reconstruction from image collections. In Proceedings of the European Conference on Computer Vision (ECCV), pages 371–386, 2018.
- Category-specific object reconstruction from a single image. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pages 1966–1974, 2015.
- Deep occlusion-aware instance segmentation with overlapping bilayers. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 4019–4028, 2021.
- Perceptual learning, cognition, and expertise. In Psychology of learning and motivation, pages 117–165. Elsevier, 2013.
- Planning for grasp selection of partially occluded objects. In 2016 IEEE International Conference on Robotics and Automation (ICRA), pages 3971–3978. IEEE, 2016.
- Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
- Segment anything. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023.
- Amodal instance segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), pages 677–693. Springer, 2016.
- 2d amodal instance segmentation guided by 3d shape prior. In Proceedings of the European Conference on Computer Vision (ECCV), pages 165–181. Springer, 2022.
- Gin: Generative invariant shape prior for amodal instance segmentation. IEEE Transactions on Multimedia (MM), 2023a.
- Muva: A new large-scale benchmark for multi-view amodal instance segmentation in the shopping scenario. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 23504–23513, 2023b.
- Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), pages 740–755. Springer, 2014.
- Variational amodal object completion. Advances in Neural Information Processing Systems (NeurIPS), 33:16246–16257, 2020.
- Amodal panoptic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 21023–21032, 2022.
- A weakly supervised amodal segmenter with boundary uncertainty estimation. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7396–7405, 2021.
- Learning to segment object candidates. Advances in neural information processing systems (NeurIPS), 28, 2015.
- Amodal instance segmentation with kins dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3014–3023, 2019.
- Walt: Watch and learn 2d amodal representation from time-lapse imagery. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9356–9366, 2022.
- Indoor segmentation and support inference from rgbd images. Proceedings of the European Conference on Computer Vision (ECCV), 7576:746–760, 2012.
- Amodal segmentation through out-of-task and out-of-distribution generalization with a bayesian model. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1215–1224, 2022.
- Aisformer: Amodal instance segmentation with transformer. British Machine Vision Conference (BMVC), 2022.
- Tracking through containers and occluders in the wild. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023.
- Shape completion enabled robotic grasping. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 2442–2447, 2017.
- Learning environment-aware affordance for 3d articulated object manipulation under occlusions. In Advances in Neural Information Processing Systems (NeurIPS), 2023a.
- MagicPony: Learning articulated 3d animals in the wild. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), 2023b.
- Amodal segmentation based on visible region segmentation and shape prior. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 2995–3003, 2021.
- Segmenting moving objects via an object-centric layered representation. In Advances in Neural Information Processing Systems (NeurIPS), 2022.
- A tri-layer plugin to improve occluded detection. British Machine Vision Conference (BMVC), 2022.
- What does stable diffusion know about the 3d scene? In arXiv:2310.06836, 2023.
- Self-supervised scene de-occlusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3784–3792, 2020.
- Visiting the invisible: Layer-by-layer completed scene decomposition. International Journal of Computer Vision (IJCV), 129:3195–3215, 2021.
- Semantic amodal segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1464–1472, 2017.
- Indoor scene parsing with instance segmentation, semantic labeling and support relationship inference. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 6269–6277, 2017.
- Silhouette guided point cloud reconstruction beyond occlusion. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020.