Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

TIDE: Test Time Few Shot Object Detection (2311.18358v1)

Published 30 Nov 2023 in cs.CV

Abstract: Few-shot object detection (FSOD) aims to extract semantic knowledge from limited object instances of novel categories within a target domain. Recent advances in FSOD focus on fine-tuning the base model based on a few objects via meta-learning or data augmentation. Despite their success, the majority of them are grounded with parametric readjustment to generalize on novel objects, which face considerable challenges in Industry 5.0, such as (i) a certain amount of fine-tuning time is required, and (ii) the parameters of the constructed model being unavailable due to the privilege protection, making the fine-tuning fail. Such constraints naturally limit its application in scenarios with real-time configuration requirements or within black-box settings. To tackle the challenges mentioned above, we formalize a novel FSOD task, referred to as Test TIme Few Shot DEtection (TIDE), where the model is un-tuned in the configuration procedure. To that end, we introduce an asymmetric architecture for learning a support-instance-guided dynamic category classifier. Further, a cross-attention module and a multi-scale resizer are provided to enhance the model performance. Experimental results on multiple few-shot object detection platforms reveal that the proposed TIDE significantly outperforms existing contemporary methods. The implementation codes are available at https://github.com/deku-0621/TIDE

Definition Search Book Streamline Icon: https://streamlinehq.com
References (66)
  1. J. Mao, S. Shi, X. Wang, and H. Li, “3d object detection for autonomous driving: A comprehensive survey,” International Journal of Computer Vision, pp. 1–55, 2023.
  2. H. Schneiderman and T. Kanade, “A statistical method for 3d object detection applied to faces and cars,” in Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No. PR00662), vol. 1, pp. 746–751, IEEE, 2000.
  3. K. Muhammad, J. Ahmad, Z. Lv, P. Bellavista, P. Yang, and S. W. Baik, “Efficient deep cnn-based fire detection and localization in video surveillance applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 49, no. 7, pp. 1419–1434, 2018.
  4. H. Wu, D. Wu, and J. Zhao, “An intelligent fire detection approach through cameras based on computer vision methods,” Process Safety and Environmental Protection, vol. 127, pp. 245–256, 2019.
  5. S. Kong, M. Tian, C. Qiu, Z. Wu, and J. Yu, “Iwscr: An intelligent water surface cleaner robot for collecting floating garbage,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 51, no. 10, pp. 6358–6368, 2020.
  6. Z. Zou, K. Chen, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,” Proceedings of the IEEE, 2023.
  7. Z.-Q. Zhao, P. Zheng, S.-t. Xu, and X. Wu, “Object detection with deep learning: A review,” IEEE transactions on neural networks and learning systems, vol. 30, no. 11, pp. 3212–3232, 2019.
  8. Z. Ji, P. An, X. Liu, Y. Pang, L. Shao, and Z. Zhang, “Task-oriented high-order context graph networks for few-shot human-object interaction recognition,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 9, pp. 5443–5455, 2021.
  9. T. Zhang, Y. Cong, J. Dong, and D. Hou, “Partial visual-tactile fused learning for robotic object recognition,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 7, pp. 4349–4361, 2021.
  10. S. Minaeian, J. Liu, and Y.-J. Son, “Vision-based target detection and localization via a team of cooperative uav and ugvs,” IEEE Transactions on systems, man, and cybernetics: systems, vol. 46, no. 7, pp. 1005–1016, 2015.
  11. M. Liu, L. Hu, Y. Tang, C. Wang, Y. He, C. Zeng, K. Lin, Z. He, and W. Huo, “A deep learning method for breast cancer classification in the pathology images,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 10, pp. 5025–5032, 2022.
  12. A. Katzmann, O. Taubmann, S. Ahmad, A. Mühlberg, M. Sühling, and H.-M. Groß, “Explaining clinical decision support systems in medical imaging using cycle-consistent activation maximization,” Neurocomputing, vol. 458, pp. 141–156, 2021.
  13. L. Mannocci, S. Villon, M. Chaumont, N. Guellati, N. Mouquet, C. Iovan, L. Vigliola, and D. Mouillot, “Leveraging social media and deep learning to detect rare megafauna in video surveys,” Conservation Biology, vol. 36, no. 1, p. e13798, 2022.
  14. H. Huang, X. Luo, and C. Yang, “Industrial few-shot fractal object detection,” Neural Computing and Applications, vol. 35, no. 28, pp. 21055–21069, 2023.
  15. L. B. Smith, S. S. Jones, B. Landau, L. Gershkoff-Stowe, and L. Samuelson, “Object name learning provides on-the-job training for attention,” Psychological science, vol. 13, no. 1, pp. 13–19, 2002.
  16. L. K. Samuelson and L. B. Smith, “They call it like they see it: Spontaneous naming and attention to shape,” Developmental science, vol. 8, no. 2, pp. 182–198, 2005.
  17. L. Jiaxu, C. Taiyue, G. Xinbo, Y. Yongtao, W. Ye, G. Feng, and W. Yue, “A comparative review of recent few-shot object detection algorithms,” arXiv preprint arXiv:2111.00201, 2021.
  18. P. Xiong, X. Tong, P. X. Liu, A. Song, and Z. Li, “Robotic object perception based on multispectral few-shot coupled learning,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023.
  19. L. Liu, B. Ma, Y. Zhang, X. Yi, and H. Li, “Afd-net: Adaptive fully-dual network for few-shot object detection,” in Proceedings of the 29th ACM International Conference on Multimedia, pp. 2549–2557, 2021.
  20. W. Yin, “Meta-learning for few-shot natural language processing: A survey,” arXiv preprint arXiv:2007.09604, 2020.
  21. D. Chen and B.-D. Liu, “Ldanet: a discriminant subspace for metric-based few-shot learning,” in 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1075–1080, IEEE, 2021.
  22. H. Chen, Y. Wang, G. Wang, and Y. Qiao, “Lstd: A low-shot transfer detector for object detection,” in Proceedings of the AAAI conference on artificial intelligence, vol. 32, 2018.
  23. B. Li, Q. Yao, and K. Wang, “A review on vision-based pedestrian detection in intelligent transportation systems,” in Proceedings of 2012 9th IEEE international conference on networking, sensing and control, pp. 393–398, IEEE, 2012.
  24. Z. Yang, Y. Wang, X. Chen, J. Liu, and Y. Qiao, “Context-transformer: Tackling object confusion for few-shot detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12653–12660, 2020.
  25. W. Zhang and Y.-X. Wang, “Hallucination improves few-shot object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13008–13017, 2021.
  26. B. Demirel, O. B. Baran, and R. G. Cinbis, “Meta-tuning loss functions and data augmentation for few-shot object detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7339–7349, 2023.
  27. R. Theagarajan and B. Bhanu, “Privacy preserving defense for black box classifiers against on-line adversarial attacks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 12, pp. 9503–9520, 2021.
  28. S. Lala, M. Shady, A. Belyaeva, and M. Liu, “Evaluation of mode collapse in generative adversarial networks,” High Performance Extreme Computing, 2018.
  29. S. Yao, Q. Kang, M. Zhou, M. J. Rawa, and A. Albeshri, “Discriminative manifold distribution alignment for domain adaptation,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 53, no. 2, pp. 1183–1197, 2022.
  30. F. Li, H. Zhang, S. Liu, J. Guo, L. M. Ni, and L. Zhang, “Dn-detr: Accelerate detr training by introducing query denoising,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13619–13627, 2022.
  31. F. Li, H. Zhang, H. Xu, S. Liu, L. Zhang, L. M. Ni, and H.-Y. Shum, “Mask dino: Towards a unified transformer-based framework for object detection and segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3041–3050, 2023.
  32. J. Liu, L. Wang, and M.-H. Yang, “Referring expression generation and comprehension via attributes,” in Proceedings of the IEEE International Conference on Computer Vision, pp. 4856–4864, 2017.
  33. J. Wang, Y. Tian, Y. Wang, J. Yang, X. Wang, S. Wang, and O. Kwan, “A framework and operational procedures for metaverses-based industrial foundation models,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 53, no. 4, pp. 2037–2046, 2022.
  34. D. Fellner, T. I. Strasser, W. Kastner, B. Feizifar, and I. F. Abdulhadi, “Data driven transformer level misconfiguration detection in power distribution grids,” in 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 1840–1847, IEEE, 2022.
  35. H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum, “Dino: Detr with improved denoising anchor boxes for end-to-end object detection,” arXiv preprint arXiv:2203.03605, 2022.
  36. S. Liu, Z. Zeng, T. Ren, F. Li, H. Zhang, J. Yang, C. Li, J. Yang, H. Su, J. Zhu, et al., “Grounding dino: Marrying dino with grounded pre-training for open-set object detection,” arXiv preprint arXiv:2303.05499, 2023.
  37. Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 10012–10022, 2021.
  38. M. Caron, H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” in Proceedings of the IEEE/CVF international conference on computer vision, pp. 9650–9660, 2021.
  39. M. Oquab, T. Darcet, T. Moutakanni, H. Vo, M. Szafraniec, V. Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, et al., “Dinov2: Learning robust visual features without supervision,” arXiv preprint arXiv:2304.07193, 2023.
  40. G. Kim, H.-G. Jung, and S.-W. Lee, “Few-shot object detection via knowledge transfer,” in 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 3564–3569, IEEE, 2020.
  41. Q. Xie, X. Pan, W. Liu, and B. Liu, “Multi-relational semantic distillation for few-shot object detection,” in 2022 IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2755–2761, IEEE, 2022.
  42. X. Wang, T. E. Huang, T. Darrell, J. E. Gonzalez, and F. Yu, “Frustratingly simple few-shot object detection,” in Proceedings of the International Conference on Machine Learning, pp. 1–12, 2020.
  43. J. Wu, S. Liu, D. Huang, and Y. Wang, “Multi-scale positive sample refinement for few-shot object detection,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16, pp. 456–472, Springer, 2020.
  44. C. Zhu, F. Chen, U. Ahmed, Z. Shen, and M. Savvides, “Semantic relation reasoning for shot-stable few-shot object detection,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8782–8791, 2021.
  45. X. Yan, Z. Chen, A. Xu, X. Wang, X. Liang, and L. Lin, “Meta r-cnn: Towards general solver for instance-level low-shot learning,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9577–9586, 2019.
  46. Y. Xiao, V. Lepetit, and R. Marlet, “Few-shot object detection and viewpoint estimation for objects in the wild,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 3, pp. 3090–3106, 2022.
  47. Q. Fan, W. Zhuo, C.-K. Tang, and Y.-W. Tai, “Few-shot object detection with attention-rpn and multi-relation detector,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 4013–4022, 2020.
  48. Y. Sun, X. Wang, Z. Liu, J. Miller, A. Efros, and M. Hardt, “Test-time training with self-supervision for generalization under distribution shifts,” in International conference on machine learning, pp. 9229–9248, PMLR, 2020.
  49. J. Zhang, L. Qi, Y. Shi, and Y. Gao, “Domainadaptor: A novel approach to test-time adaptation,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 18971–18981, 2023.
  50. Q. Wang, O. Fink, L. Van Gool, and D. Dai, “Continual test-time domain adaptation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7201–7211, 2022.
  51. D. Wang, E. Shelhamer, S. Liu, B. Olshausen, and T. Darrell, “Tent: Fully test-time adaptation by entropy minimization,” in International Conference on Learning Representations, 2021.
  52. F. G. Hudson and W. A. Smith, “If at first you don’t succeed: Test time re-ranking for zero-shot, cross-domain retrieval,” arXiv preprint arXiv:2303.17703, 2023.
  53. J. Shi, W. Xiong, Z. Lin, and H. J. Jung, “Instantbooth: Personalized text-to-image generation without test-time finetuning,” arXiv preprint arXiv:2304.03411, 2023.
  54. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  55. Q. Fan, W. Zhuo, C.-K. Tang, and Y.-W. Tai, “Few-shot object detection with attention-rpn and multi-relation detector,” in CVPR, 2020.
  56. G. Han, J. Ma, S. Huang, L. Chen, and S.-F. Chang, “Few-shot object detection with fully cross-transformer,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 5321–5330, 2022.
  57. B. Li, C. Wang, P. Reddy, S. Kim, and S. Scherer, “Airdet: Few-shot detection without fine-tuning for autonomous exploration,” in European Conference on Computer Vision, pp. 427–444, Springer, 2022.
  58. Z.-Y. Chen, Z.-P. Fan, and M. Sun, “Inventory management with multisource heterogeneous information: Roles of representation learning and information fusion,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023.
  59. J. Wu, T. Mu, J. Thiyagalingam, and J. Y. Goulermas, “Memory-aware attentive control for community question answering with knowledge-based dual refinement,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2023.
  60. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision, pp. 213–229, Springer, 2020.
  61. X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,” arXiv preprint arXiv:2010.04159, 2020.
  62. G. Chen, H. Wang, K. Chen, Z. Li, Z. Song, Y. Liu, W. Chen, and A. Knoll, “A survey of the four pillars for small object detection: Multiscale representation, contextual information, super-resolution, and region proposal,” IEEE Transactions on systems, man, and cybernetics: systems, vol. 52, no. 2, pp. 936–953, 2020.
  63. S. Liu, F. Li, H. Zhang, X. Yang, X. Qi, H. Su, J. Zhu, and L. Zhang, “DAB-DETR: Dynamic anchor boxes are better queries for DETR,” in International Conference on Learning Representations, 2022.
  64. H. Rezatofighi, N. Tsoi, J. Gwak, A. Sadeghian, I. Reid, and S. Savarese, “Generalized intersection over union: A metric and a loss for bounding box regression,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 658–666, 2019.
  65. S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” Advances in neural information processing systems, vol. 28, 2015.
  66. B. Kang, Z. Liu, X. Wang, F. Yu, J. Feng, and T. Darrell, “Few-shot object detection via feature reweighting,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 8420–8429, 2019.
Citations (4)

Summary

We haven't generated a summary for this paper yet.