Self-Balanced R-CNN for Instance Segmentation (2404.16633v1)
Abstract: Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at https://github.com/IMPLabUniPr/mmdetection/tree/sbr_cnn
- H. Chen, X. Qi, L. Yu, Q. Dou, J. Qin, and P.-A. Heng, “Dcan: Deep contour-aware networks for object instance segmentation from histology images,” Medical image analysis, vol. 36, pp. 135–146, 2017.
- L. Huang, T. Zhe, J. Wu, Q. Wu, C. Pei, and D. Chen, “Robust inter-vehicle distance estimation method based on monocular vision,” IEEE Access, vol. 7, pp. 46059–46070, 2019.
- M. Tian, S. Yi, H. Li, S. Li, X. Zhang, J. Shi, J. Yan, and X. Wang, “Eliminating background-bias for robust person re-identification,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5794–5803, 2018.
- Y. Ge, Y. Xiong, and P. J. From, “Instance segmentation and localization of strawberries in farm conditions for automatic fruit harvesting,” IFAC-PapersOnLine, vol. 52, no. 30, pp. 294–299, 2019.
- S. Liu, X. Liang, L. Liu, X. Shen, J. Yang, C. Xu, L. Lin, X. Cao, and S. Yan, “Matching-cnn meets knn: Quasi-parametric human parsing,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 1419–1427, 2015.
- K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask r-cnn,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969, 2017.
- R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 580–587, 2014.
- K. Oksuz, B. C. Cam, S. Kalkan, and E. Akbas, “Imbalance problems in object detection: A review,” IEEE transactions on pattern analysis and machine intelligence, 2020.
- G. Song, Y. Liu, and X. Wang, “Revisiting the sibling head in object detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11563–11572, 2020.
- S. Zhang, C. Chi, Y. Yao, Z. Lei, and S. Z. Li, “Bridging the gap between anchor-based and anchor-free detection via adaptive training sample selection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768, 2020.
- L. Rossi, A. Karimi, and A. Prati, “Recursively refined r-cnn: Instance segmentation with self-roi rebalancing,” in International Conference on Computer Analysis of Images and Patterns, pp. 476–486, Springer, 2021.
- L. Rossi, A. Karimi, and A. Prati, “A novel region of interest extraction layer for instance segmentation,” in 2020 25th International Conference on Pattern Recognition (ICPR), pp. 2203–2209, 2021.
- Z. Cai and N. Vasconcelos, “Cascade r-cnn: Delving into high quality object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 6154–6162, 2018.
- Y. Wu, Y. Chen, L. Yuan, Z. Liu, L. Wang, H. Li, and Y. Fu, “Rethinking classification and localization for object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 10186–10195, 2020.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in Proceedings of European Conference on Computer Vision, pp. 21–37, Springer, 2016.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” arXiv preprint arXiv:1506.01497, 2015.
- Y. Liu, Y. Wang, S. Wang, T. Liang, Q. Zhao, Z. Tang, and H. Ling, “Cbnet: A novel composite backbone network architecture for object detection.,” in AAAI, pp. 11653–11660, 2020.
- T. Vu, H. Jang, T. X. Pham, and C. Yoo, “Cascade rpn: Delving into high-quality region proposal network with adaptive convolution,” in Advances in Neural Information Processing Systems, pp. 1432–1442, 2019.
- J. Wang, K. Chen, S. Yang, C. C. Loy, and D. Lin, “Region proposal by guided anchoring,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2965–2974, 2019.
- Q. Zhong, C. Li, Y. Zhang, D. Xie, S. Yang, and S. Pu, “Cascade region proposal and global context for deep object detection,” Neurocomputing, vol. 395, pp. 170–177, 2020.
- K. Chen, J. Pang, J. Wang, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Shi, W. Ouyang, et al., “Hybrid task cascade for instance segmentation,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 4974–4983, 2019.
- A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 761–769, 2016.
- J. Pang, K. Chen, J. Shi, H. Feng, W. Ouyang, and D. Lin, “Libra r-cnn: Towards balanced learning for object detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 821–830, 2019.
- B. Cheng, Y. Wei, H. Shi, R. Feris, J. Xiong, and T. Huang, “Revisiting rcnn: On awakening the classification power of faster rcnn,” in Proceedings of the European conference on computer vision (ECCV), pp. 453–468, 2018.
- L. Zhu, Z. Xie, L. Liu, B. Tao, and W. Tao, “Iou-uniform r-cnn: Breaking through the limitations of rpn,” Pattern Recognition, p. 107816, 2021.
- S. Qiao, L.-C. Chen, and A. Yuille, “Detectors: Detecting objects with recursive feature pyramid and switchable atrous convolution,” arXiv preprint arXiv:2006.02334, 2020.
- K. Oksuz, B. C. Cam, E. Akbas, and S. Kalkan, “Generating positive bounding boxes for balanced training of object detectors,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 894–903, 2020.
- X. Lu, B. Li, Y. Yue, Q. Li, and J. Yan, “Grid r-cnn,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7363–7372, 2019.
- Z. Cai and N. Vasconcelos, “Cascade r-cnn: High quality object detection and instance segmentation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019.
- Y. Cao, J. Xu, S. Lin, F. Wei, and H. Hu, “Gcnet: Non-local networks meet squeeze-excitation networks and beyond,” in Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 0–0, 2019.
- T.-Y. Lin, P. Dollar, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2117–2125, 2017.
- J. Pont-Tuset, P. Arbelaez, J. T. Barron, F. Marques, and J. Malik, “Multiscale combinatorial grouping for image segmentation and object proposal generation,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 1, pp. 128–140, 2016.
- S. Ren, K. He, R. Girshick, X. Zhang, and J. Sun, “Object detection networks on convolutional feature maps,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 7, pp. 1476–1481, 2016.
- P. O. Pinheiro, T.-Y. Lin, R. Collobert, and P. Dollar, “Learning to refine object segments,” in Proceedings of European Conference on Computer Vision, pp. 75–91, Springer, 2016.
- S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path aggregation network for instance segmentation,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 8759–8768, 2018.
- H. Xu, L. Yao, W. Zhang, X. Liang, and Z. Li, “Auto-fpn: Automatic network architecture adaptation for object detection beyond classification,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 6649–6658, 2019.
- C. Guo, B. Fan, Q. Zhang, S. Xiang, and C. Pan, “Augfpn: Improving multi-scale feature learning for object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12595–12604, 2020.
- T. D. Linh and M. Arai, “Multi-scale subnetwork for roi pooling for instance segmentation,” International Journal of Computer Theory and Engineering, vol. 10, no. 6, 2018.
- S. Bell, C. Lawrence Zitnick, K. Bala, and R. Girshick, “Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 2874–2883, 2016.
- B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik, “Hypercolumns for object segmentation and fine-grained localization,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 447–456, 2015.
- Z. Tian, C. Shen, and H. Chen, “Conditional convolutions for instance segmentation,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 282–298, Springer, 2020.
- C. Elkan, “The foundations of cost-sensitive learning,” in In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973–978, 2001.
- H. Masnadi-Shirazi and N. Vasconcelos, “Cost-sensitive boosting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 2, pp. 294–309, 2010.
- Z. Huang, L. Huang, Y. Gong, C. Huang, and X. Wang, “Mask scoring r-cnn,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6409–6418, 2019.
- X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 7794–7803, 2018.
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018.
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Proceedings of European Conference on Computer Vision, pp. 740–755, Springer, 2014.
- K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, “Mmdetection: Open mmlab detection toolbox and benchmark,” arXiv preprint arXiv:1906.07155, 2019.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
- X. Zhu, H. Hu, S. Lin, and J. Dai, “Deformable convnets v2: More deformable, better results,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9308–9316, 2019.