Referring Camouflaged Object Detection (2306.07532v2)
Abstract: We consider the problem of referring camouflaged object detection (Ref-COD), a new task that aims to segment specified camouflaged objects based on a small set of referring images with salient target objects. We first assemble a large-scale dataset, called R2C7K, which consists of 7K images covering 64 object categories in real-world scenarios. Then, we develop a simple but strong dual-branch framework, dubbed R2CNet, with a reference branch embedding the common representations of target objects from referring images and a segmentation branch identifying and segmenting camouflaged objects under the guidance of the common representations. In particular, we design a Referring Mask Generation module to generate pixel-level prior mask and a Referring Feature Enrichment module to enhance the capability of identifying specified camouflaged objects. Extensive experiments show the superiority of our Ref-COD methods over their COD counterparts in segmenting specified camouflaged objects and identifying the main body of target objects. Our code and dataset are publicly available at https://github.com/zhangxuying1004/RefCOD.
- D.-P. Fan, G.-P. Ji, M.-M. Cheng, and L. Shao, “Concealed object detection,” IEEE TPAMI, vol. 44, no. 10, pp. 6024–6042, 2022.
- Y. Pang, X. Zhao, T.-Z. Xiang, L. Zhang, and H. Lu, “Zoom in and out: A mixed-scale triplet network for camouflaged object detection,” in IEEE CVPR, 2022.
- G.-P. Ji, D.-P. Fan, Y.-C. Chou, D. Dai, A. Liniger, and L. Van Gool, “Deep gradient learning for efficient camouflaged object detection,” MIR, vol. 20, no. 1, pp. 92–108, 2023.
- Y. Sun, S. Wang, C. Chen, and T.-Z. Xiang, “Boundary-guided camouflaged object detection,” in IJCAI, 2022.
- D.-P. Fan, G.-P. Ji, T. Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Pranet: Parallel reverse attention network for polyp segmentation,” in MICCAI, 2020.
- D.-P. Fan, T. Zhou, G.-P. Ji, Y. Zhou, G. Chen, H. Fu, J. Shen, and L. Shao, “Inf-net: Automatic covid-19 lung infection segmentation from ct images,” IEEE TMI, vol. 39, no. 8, pp. 2626–2637, 2020.
- D. Tabernik, S. Šela, J. Skvarč, and D. Skočaj, “Segmentation-based deep-learning approach for surface-defect detection,” J. Intell. Manuf., vol. 31, no. 3, pp. 759–776, 2020.
- X. Le, J. Mei, H. Zhang, B. Zhou, and J. Xi, “A learning-based approach for surface defect detection using small image datasets,” Neurocomputing, 2020.
- M. Türkoğlu and D. Hanbay, “Plant disease and pest detection using deep learning-based features,” Turk J Elec Eng & Comp Sci, vol. 27, no. 3, pp. 1636–1651, 2019.
- T. Troscianko, C. P. Benton, P. G. Lovell, D. J. Tolhurst, and Z. Pizlo, “Camouflage and visual perception,” Philos. Trans. R. Soc. B: Biol. Sci., vol. 364, no. 1516, pp. 449–461, 2009.
- R. Hu, M. Rohrbach, and T. Darrell, “Segmentation from natural language expressions,” in ECCV, 2016.
- C. Zhang, G. Lin, F. Liu, R. Yao, and C. Shen, “Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in IEEE CVPR, 2019.
- T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in IEEE CVPR, 2017.
- F. Perazzi, P. Krähenbühl, Y. Pritch, and A. Hornung, “Saliency filters: Contrast based filtering for salient region detection,” in IEEE CVPR, 2012.
- D.-P. Fan, M.-M. Cheng, Y. Liu, T. Li, and A. Borji, “Structure-measure: A new way to evaluate foreground maps,” in IEEE ICCV, 2017.
- D.-P. Fan, C. Gong, Y. Cao, B. Ren, M.-M. Cheng, and A. Borji, “Enhanced-alignment measure for binary foreground map evaluation,” in IJCAI, 2018.
- R. Margolin, L. Zelnik-Manor, and A. Tal, “How to evaluate foreground maps?” in IEEE CVPR, 2014.
- D.-P. Fan, G.-P. Ji, G. Sun, M.-M. Cheng, J. Shen, and L. Shao, “Camouflaged object detection,” in IEEE CVPR, 2020.
- Y. Sun, G. Chen, T. Zhou, Y. Zhang, and N. Liu, “Context-aware cross-level fusion network for camouflaged object detection,” arXiv preprint arXiv:2105.12555, 2021.
- M.-C. Chou, H.-J. Chen, and H.-H. Shuai, “Finding the achilles heel: Progressive identification network for camouflaged object detection,” in IEEE ICME, 2022.
- M. Zhuge, X. Lu, Y. Guo, Z. Cai, and S. Chen, “Cubenet: X-shape connection for camouflaged object detection,” PR, vol. 127, p. 108644, 2022.
- G. Chen, S.-J. Liu, Y.-J. Sun, G.-P. Ji, Y.-F. Wu, and T. Zhou, “Camouflaged object detection via context-aware cross-level fusion,” IEEE TCSVT, vol. 32, no. 10, pp. 6981–6993, 2022.
- Q. Jia, S. Yao, Y. Liu, X. Fan, R. Liu, and Z. Luo, “Segment, magnify and reiterate: Detecting camouflaged objects the hard way,” in IEEE CVPR, 2022.
- M. Zhang, S. Xu, Y. Piao, D. Shi, S. Lin, and H. Lu, “Preynet: Preying on camouflaged objects,” in ACM MM, 2022.
- K. Wang, H. Bi, Y. Zhang, C. Zhang, Z. Liu, and S. Zheng, “D 22{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT c-net: A dual-branch, dual-guidance and cross-refine network for camouflaged object detection,” IEEE TIE, vol. 69, no. 5, pp. 5364–5374, 2021.
- Q. Zhai, X. Li, F. Yang, C. Chen, H. Cheng, and D.-P. Fan, “Mutual graph learning for camouflaged object detection,” in IEEE CVPR, 2021.
- R. He, Q. Dong, J. Lin, and R. W. Lau, “Weakly-supervised camouflaged object detection with scribble annotations,” in AAAI, 2022.
- A. Li, J. Zhang, Y. Lv, B. Liu, T. Zhang, and Y. Dai, “Uncertainty-aware joint salient object and camouflaged object detection,” in IEEE CVPR, 2021.
- J. Liu, J. Zhang, and N. Barnes, “Modeling aleatoric uncertainty for camouflaged object detection,” in IEEE WACV, 2022.
- N. Kajiura, H. Liu, and S. Satoh, “Improving camouflaged object detection with the uncertainty of pseudo-edge labels,” in ACM MM Asia, 2021.
- H. Mei, G.-P. Ji, Z. Wei, X. Yang, X. Wei, and D.-P. Fan, “Camouflaged object segmentation with distraction mining,” in IEEE CVPR, 2021.
- B. Yin, X. Zhang, Q. Hou, B.-Y. Sun, D.-P. Fan, and L. Van Gool, “Camoformer: Masked separable attention for camouflaged object detection,” arXiv preprint arXiv:2212.06570, 2022.
- W. Zhai, Y. Cao, H. Xie, and Z.-J. Zha, “Deep texton-coherence network for camouflaged object detection,” IEEE TMM, 2022.
- C. Zhang, K. Wang, H. Bi, Z. Liu, and L. Yang, “Camouflaged object detection via neighbor connection and hierarchical information transfer,” CVIU, vol. 221, p. 103450, 2022.
- Y. Cheng, H.-Z. Hao, Y. Ji, Y. Li, and C.-P. Liu, “Attention-based neighbor selective aggregation network for camouflaged object detection,” in IEEE IJCNN, 2022, pp. 1–8.
- D.-P. Fan, G.-P. Ji, P. Xu, M.-M. Cheng, C. Sakaridis, and L. Van Gool, “Advances in deep concealed scene understanding,” VINT, 2023.
- H. Zhu, P. Li, H. Xie, X. Yan, D. Liang, D. Chen, M. Wei, and J. Qin, “I can find you! boundary-guided separated attention network for camouflaged object detection,” in AAAI, 2022.
- T. Zhou, Y. Zhou, C. Gong, J. Yang, and Y. Zhang, “Feature aggregation and propagation network for camouflaged object detection,” IEEE TIP, vol. 31, pp. 7036–7047, 2022.
- X. Qin, D.-P. Fan, C. Huang, C. Diagne, Z. Zhang, A. C. Sant’Anna, A. Suarez, M. Jagersand, and L. Shao, “Boundary-aware segmentation network for mobile and web applications,” arXiv preprint arXiv:2101.04704, 2021.
- G.-P. Ji, L. Zhu, M. Zhuge, and K. Fu, “Fast camouflaged object detection via edge-based reversible re-calibration network,” PR, vol. 123, p. 108414, 2022.
- J. Zhu, X. Zhang, S. Zhang, and J. Liu, “Inferring camouflaged objects by texture-aware interactive guidance network,” in AAAI, 2021.
- J. Ren, X. Hu, L. Zhu, X. Xu, Y. Xu, W. Wang, Z. Deng, and P.-A. Heng, “Deep texture-aware features for camouflaged object detection,” IEEE TCSVT, 2021.
- Y. Zhong, B. Li, L. Tang, S. Kuang, S. Wu, and S. Ding, “Detecting camouflaged object in frequency domain,” in IEEE CVPR, 2022.
- J. Lin, X. Tan, K. Xu, L. Ma, and R. W. Lau, “Frequency-aware camouflaged object detection,” ACM TMCCA, vol. 19, no. 2, pp. 1–16, 2023.
- J. Zhang, Y. Lv, M. Xiang, A. Li, Y. Dai, and Y. Zhong, “Depth-guided camouflaged object detection,” arXiv preprint arXiv:2106.13217, 2021.
- M. Xiang, J. Zhang, Y. Lv, A. Li, Y. Zhong, and Y. Dai, “Exploring depth contribution for camouflaged object detection,” arXiv e-prints, pp. arXiv–2106, 2021.
- Z. Wu, D. P. Paudel, D.-P. Fan, J. Wang, S. Wang, C. Demonceaux, R. Timofte, and L. Van Gool, “Source-free depth for object pop-out,” arXiv preprint arXiv:2212.05370, 2022.
- M.-M. Cheng, N. J. Mitra, X. Huang, P. H. Torr, and S.-M. Hu, “Global contrast based salient region detection,” IEEE TPAMI, vol. 37, no. 3, pp. 569–582, 2014.
- L. Wang, H. Lu, X. Ruan, and M.-H. Yang, “Deep networks for saliency detection via local estimation and global search,” in IEEE CVPR, 2015.
- J. Kim and V. Pavlovic, “A shape-based approach for salient object detection using deep learning,” in ECCV, 2016.
- S. He, R. W. Lau, W. Liu, Z. Huang, and Q. Yang, “Supercnn: A superpixelwise convolutional neural network for salient object detection,” IJCV, vol. 115, pp. 330–344, 2015.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in IEEE CVPR, 2015.
- A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” in NeurIPS, 2017.
- X. Zhang, X. Sun, Y. Luo, J. Ji, Y. Zhou, Y. Wu, F. Huang, and R. Ji, “Rstnet: Captioning with adaptive attention on visual and non-visual words,” in IEEE CVPR, 2021.
- M. Wu, X. Zhang, X. Sun, Y. Zhou, C. Chen, J. Gu, X. Sun, and R. Ji, “Difnet: Boosting visual information flow for image captioning,” in IEEE CVPR, 2022.
- M. Zhuge, D.-P. Fan, N. Liu, D. Zhang, D. Xu, and L. Shao, “Salient object detection via integrity learning,” IEEE TPAMI, vol. 45, no. 3, pp. 3738–3752, 2023.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in MICCAI, 2015.
- Q. Fan, D.-P. Fan, H. Fu, C.-K. Tang, L. Shao, and Y.-W. Tai, “Group collaborative learning for co-salient object detection,” in IEEE CVPR, 2021.
- Z. Zhang, W. Jin, J. Xu, and M.-M. Cheng, “Gradient-induced co-saliency detection,” in ECCV, 2020.
- Q. Hou, M.-M. Cheng, X. Hu, A. Borji, Z. Tu, and P. H. Torr, “Deeply supervised salient object detection with short connections,” in IEEE CVPR, 2017.
- N. Liu, J. Han, and M.-H. Yang, “Picanet: Learning pixel-wise contextual attention for saliency detection,” in IEEE CVPR, 2018.
- S. Chen, X. Tan, B. Wang, and X. Hu, “Reverse attention for salient object detection,” in ECCV, 2018.
- A. Borji, S. Frintrop, D. N. Sihite, and L. Itti, “Adaptive object tracking by learning background context,” in IEEE CVPRW, 2012.
- Z.-Y. Li, S. Gao, and M.-M. Cheng, “Exploring feature self-relation for self-supervised transformer,” arXiv preprint arXiv:2206.05184, 2022.
- C. Guo and L. Zhang, “A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression,” IEEE TIP, vol. 19, no. 1, pp. 185–198, 2009.
- M.-M. Cheng, F.-L. Zhang, N. J. Mitra, X. Huang, and S.-M. Hu, “Repfinder: finding approximately repeated scene elements for image editing,” ACM TOG, vol. 29, no. 4, pp. 1–8, 2010.
- A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” in BMVC, 2017.
- X. Zhang, Y. Wei, Y. Yang, and T. S. Huang, “Sg-one: Similarity guidance network for one-shot semantic segmentation,” IEEE TCYB, vol. 50, no. 9, pp. 3855–3865, 2020.
- Z. Tian, H. Zhao, M. Shu, Z. Yang, R. Li, and J. Jia, “Prior guided feature enrichment network for few-shot segmentation,” IEEE TPAMI, vol. 44, no. 2, pp. 1050–1065, 2020.
- C. Lang, G. Cheng, B. Tu, and J. Han, “Learning what not to segment: A new perspective on few-shot segmentation,” in IEEE CVPR, 2022.
- R. Li, K. Li, Y.-C. Kuo, M. Shu, X. Qi, X. Shen, and J. Jia, “Referring image segmentation via recurrent refinement networks,” in IEEE CVPR, 2018.
- C. Liu, Z. Lin, X. Shen, J. Yang, X. Lu, and A. Yuille, “Recurrent multimodal interaction for referring image segmentation,” in IEEE ICCV, 2017.
- H. Shi, H. Li, F. Meng, and Q. Wu, “Key-word-aware network for referring expression image segmentation,” in ECCV, 2018.
- Y. Zhou, R. Ji, G. Luo, X. Sun, J. Su, X. Ding, C.-W. Lin, and Q. Tian, “A real-time global inference network for one-stage referring expression comprehension,” IEEE TNNLS, 2021.
- G. Luo, Y. Zhou, X. Sun, L. Cao, C. Wu, C. Deng, and R. Ji, “Multi-task collaborative network for joint referring expression comprehension and segmentation,” in IEEE CVPR, 2020.
- X. Sun, X. Zhang, L. Cao, Y. Wu, F. Huang, and R. Ji, “Exploring language prior for mode-sensitive visual attention modeling,” in ACM MM, 2020.
- F. Perazzi, J. Pont-Tuset, B. McWilliams, L. Van Gool, M. Gross, and A. Sorkine-Hornung, “A benchmark dataset and evaluation methodology for video object segmentation,” in IEEE CVPR, 2016.
- W. Wang, J. Shen, F. Guo, M.-M. Cheng, and A. Borji, “Revisiting video saliency: A large-scale benchmark and a new model,” in IEEE CVPR, 2018.
- T.-N. Le, T. V. Nguyen, Z. Nie, M.-T. Tran, and A. Sugimoto, “Anabranch network for camouflaged object segmentation,” CVIU, vol. 184, pp. 45–56, 2019.
- Y. Lv, J. Zhang, Y. Dai, A. Li, B. Liu, N. Barnes, and D.-P. Fan, “Simultaneously localize, segment and rank the camouflaged objects,” in IEEE CVPR, 2021.
- B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba, “Places: A 10 million image database for scene recognition,” IEEE TPAMI, vol. 40, no. 6, pp. 1452–1464, 2017.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE CVPR, 2016.
- B. Huang, D. Lian, W. Luo, and S. Gao, “Look before you leap: Learning landmark features for one-stage visual grounding,” in IEEE CVPR, 2021.
- Z. Yang, B. Gong, L. Wang, W. Huang, D. Yu, and J. Luo, “A fast and accurate one-stage approach to visual grounding,” in IEEE ICCV, 2019.
- X. Shi, Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-c. Woo, “Convolutional lstm network: A machine learning approach for precipitation nowcasting,” in NeurIPS, 2015.
- M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in ICML, 2019.
- S.-H. Gao, M.-M. Cheng, K. Zhao, X.-Y. Zhang, M.-H. Yang, and P. Torr, “Res2net: A new multi-scale backbone architecture,” IEEE TPAMI, vol. 43, no. 2, pp. 652–662, 2019.
- Y. Jing, T. Kong, W. Wang, L. Wang, L. Li, and T. Tan, “Locate then segment: A strong pipeline for referring image segmentation,” in IEEE CVPR, 2021.
- J. Liu, Y. Bao, G.-S. Xie, H. Xiong, J.-J. Sonke, and E. Gavves, “Dynamic prototype convolution network for few-shot semantic segmentation,” in IEEE CVPR, 2022.
- J. Wei, S. Wang, and Q. Huang, “F33{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPTnet: fusion, feedback and focus for salient object detection,” in AAAI, 2020.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in ICLR, 2015.
- I. Loshchilov and F. Hutter, “Sgdr: Stochastic gradient descent with warm restarts,” in ICLR, 2017.
- A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., “Pytorch: An imperative style, high-performance deep learning library,” in NeurIPS, 2019.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE TPAMI, vol. 40, no. 4, pp. 834–848, 2017.
- S. Liu, D. Huang et al., “Receptive field block net for accurate and fast object detection,” in ECCV, 2018.
- A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in ICML, 2021.
- T. Lüddecke and A. Ecker, “Image segmentation using text and image prompts,” in IEEE CVPR, 2022.