Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

FreeA: Human-object Interaction Detection using Free Annotation Labels (2403.01840v2)

Published 4 Mar 2024 in cs.CV and cs.AI

Abstract: Recent human-object interaction (HOI) detection methods depend on extensively annotated image datasets, which require a significant amount of manpower. In this paper, we propose a novel self-adaptive, language-driven HOI detection method, termed FreeA. This method leverages the adaptability of the text-image model to generate latent HOI labels without requiring manual annotation. Specifically, FreeA aligns image features of human-object pairs with HOI text templates and employs a knowledge-based masking technique to decrease improbable interactions. Furthermore, FreeA implements a proposed method for matching interaction correlations to increase the probability of actions associated with a particular action, thereby improving the generated HOI labels. Experiments on two benchmark datasets showcase that FreeA achieves state-of-the-art performance among weakly supervised HOI competitors. Our proposal gets +\textbf{13.29} (\textbf{159\%$\uparrow$}) mAP and +\textbf{17.30} (\textbf{98\%$\uparrow$}) mAP than the newest Weakly'' supervised model, and +\textbf{7.19} (\textbf{28\%$\uparrow$}) mAP and +\textbf{14.69} (\textbf{34\%$\uparrow$}) mAP than the latestWeakly+'' supervised model, respectively, on HICO-DET and V-COCO datasets, more accurate in localizing and classifying the interactive actions. The source code will be made public.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (55)
  1. Y.-W. Chao, Y. Liu, X. Liu, H. Zeng, and J. Deng, “Learning to detect human-object interactions,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).   IEEE, 2018, pp. 381–389.
  2. Y. Liao, S. Liu, F. Wang, Y. Chen, C. Qian, and J. Feng, “PPDM: Parallel point detection and matching for real-time human-object interaction detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 482–490.
  3. T. Wang, T. Yang, M. Danelljan, F. S. Khan, X. Zhang, and J. Sun, “Learning human-object interaction detection using interaction points,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 4116–4125.
  4. X. Zhong, X. Qu, C. Ding, and D. Tao, “Glance and gaze: Inferring action-aware points for one-stage human-object interaction detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13 234–13 243.
  5. B. Kim, T. Choi, J. Kang, and H. J. Kim, “UnionDet: Union-level detector towards real-time human-object interaction detection,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 498–514.
  6. H.-S. Fang, Y. Xie, D. Shao, and C. Lu, “DIRV: Dense interaction region voting for end-to-end human-object interaction detection,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, no. 2, 2021, pp. 1291–1299.
  7. M. Chen, Y. Liao, S. Liu, Z. Chen, F. Wang, and C. Qian, “Reformulating HOI detection as adaptive set prediction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 9004–9013.
  8. C. Zou, B. Wang, Y. Hu, J. Liu, Q. Wu, Y. Zhao, B. Li, C. Zhang, C. Zhang, Y. Wei et al., “End-to-end human object interaction detection with HOI transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 11 825–11 834.
  9. Y. Liao, A. Zhang, M. Lu, Y. Wang, X. Li, and S. Liu, “GEN-VLKT: Simplify association and enhance interaction understanding for hoi detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 20 123–20 132.
  10. Y. Wang, Q. Liu, and Y. Lei, “Ted-net: Dispersal attention for perceiving interaction region in indirectly-contact hoi detection,” IEEE Transactions on Circuits and Systems for Video Technology, 2024.
  11. W.-K. Lin, H.-B. Zhang, Z. Fan, J.-H. Liu, L.-J. Yang, Q. Lei, and J. Du, “Point-based learnable query generator for human–object interaction detection,” IEEE Transactions on Image Processing, vol. 32, pp. 6469–6484, 2023.
  12. T. He, L. Gao, J. Song, and Y.-F. Li, “Toward a unified transformer-based framework for scene graph generation and human-object interaction detection,” IEEE Transactions on Image Processing, vol. 32, pp. 6274–6288, 2023.
  13. G. Gkioxari, R. Girshick, P. Dollár, and K. He, “Detecting and recognizing human-object interactions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 8359–8367.
  14. Y. Liu, Q. Chen, and A. Zisserman, “Amplifying key cues for human-object-interaction detection,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 248–265.
  15. A. Bansal, S. S. Rambhatla, A. Shrivastava, and R. Chellappa, “Detecting human-object interactions via functional generalization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, 2020, pp. 10 460–10 469.
  16. A. Iftekhar, S. Kumar, R. A. McEver, S. You, and B. Manjunath, “GTNet: Guided transformer network for detecting human-object interactions,” arXiv preprint arXiv:2108.00596, 2021.
  17. X. Zhong, C. Ding, X. Qu, and D. Tao, “Polysemy deciphering network for human-object interaction detection,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 69–85.
  18. Y. Liu, J. Yuan, and C. W. Chen, “ConsNet: Learning consistency graph for zero-shot human-object interaction detection,” in Proceedings of the 28th ACM International Conference on Multimedia, 2020, pp. 4235–4243.
  19. S. Qi, W. Wang, B. Jia, J. Shen, and S.-C. Zhu, “Learning human-object interactions by graph parsing neural networks,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 401–417.
  20. Y. Gao, Z. Kuang, G. Li, W. Zhang, and L. Lin, “Hierarchical reasoning network for human-object interaction detection,” IEEE Transactions on Image Processing, vol. 30, pp. 8306–8317, 2021.
  21. D.-J. Kim, X. Sun, J. Choi, S. Lin, and I. S. Kweon, “Acp++: Action co-occurrence priors for human-object interaction detection,” IEEE Transactions on Image Processing, vol. 30, pp. 9150–9163, 2021.
  22. H. Wang, L. Jiao, F. Liu, L. Li, X. Liu, D. Ji, and W. Gan, “Ipgn: Interactiveness proposal graph network for human-object interaction detection,” IEEE Transactions on Image Processing, vol. 30, pp. 6583–6593, 2021.
  23. M. E. Unal and A. Kovashka, “Vlms and llms can help detect human-object interactions with weak supervision,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, url: https://asu-apg.github.io/odrum/posters_normal-_\__2023/poster_6.pdf, 2023.
  24. S. K. Kumaraswamy, M. Shi, and E. Kijak, “Detecting human-object interaction with mixed supervision,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 1228–1237.
  25. H. Zhang, Z. Kyaw, J. Yu, and S.-F. Chang, “Ppr-fcn: Weakly supervised visual relation detection via parallel pairwise r-fcn,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, pp. 4233–4241.
  26. B. Wan, Y. Liu, D. Zhou, T. Tuytelaars, and X. He, “Weakly-supervised hoi detection via prior-guided bi-level representation learning,” International Conference on Learning Representations, 2023.
  27. T. Gupta, A. Schwing, and D. Hoiem, “No-frills human-object interaction detection: Factorization, layout encodings, and training techniques,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9677–9685.
  28. Z. Hou, X. Peng, Y. Qiao, and D. Tao, “Visual compositional learning for human-object interaction detection,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 584–600.
  29. Z. Hou, B. Yu, Y. Qiao, X. Peng, and D. Tao, “Affordance transfer learning for human-object interaction detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 495–504.
  30. ——, “Detecting human-object interaction via fabricated compositional learning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 14 646–14 655.
  31. J. Peyre, I. Laptev, C. Schmid, and J. Sivic, “Detecting unseen visual relations using analogies,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 1981–1990.
  32. L. Shen, S. Yeung, J. Hoffman, G. Mori, and L. Fei-Fei, “Scaling human-object interaction recognition through zero-shot learning,” in 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).   IEEE, 2018, pp. 1568–1576.
  33. O. Ulutan, A. Iftekhar, and B. S. Manjunath, “VSGNet: Spatial attention network for detecting human object interactions using graph convolutions,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 13 617–13 626.
  34. B. Xu, Y. Wong, J. Li, Q. Zhao, and M. S. Kankanhalli, “Learning to detect human-object interactions with knowledge,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.
  35. S. Ning, L. Qiu, Y. Liu, and X. He, “Hoiclip: Efficient knowledge transfer for hoi detection with vision-language models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 23 507–23 517.
  36. S. Eum and H. Kwon, “Semantics to space (s2s): Embedding semantics into spatial space for zero-shot verb-object query inferencing,” in 2020 25th International Conference on Pattern Recognition (ICPR).   IEEE, 2021, pp. 1384–1391.
  37. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” Advances in Neural Information Processing Systems, vol. 28, 2015.
  38. R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1440–1448.
  39. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587.
  40. M. Tamura, H. Ohashi, and T. Yoshinaga, “QPIC: Query-based pairwise human-object interaction detection with image-wide contextual information,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 10 410–10 419.
  41. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in Proceedings of the European Conference on Computer Vision (ECCV).   Springer, 2020, pp. 213–229.
  42. Z. Li, C. Zou, Y. Zhao, B. Li, and S. Zhong, “Improving human-object interaction detection via phrase learning and label composition,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 2, 2022, pp. 1509–1517.
  43. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8748–8763.
  44. M. Kilickaya and A. Smeulders, “Human-object interaction detection via weak supervision,” arXiv preprint arXiv:2112.00492, 2021.
  45. C. Gao, Y. Zou, and J.-B. Huang, “ICAN: Instance-centric attention network for human-object interaction detection,” arXiv preprint arXiv:1808.10437, 2018.
  46. B. Wan, D. Zhou, Y. Liu, R. Li, and X. He, “Pose-aware multi-level feature network for human object interaction detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9469–9478.
  47. Y.-L. Li, X. Liu, H. Lu, S. Wang, J. Liu, J. Li, and C. Lu, “Detailed 2d-3d joint representation for human-object interaction,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 10 166–10 175.
  48. Y.-L. Li, X. Liu, X. Wu, Y. Li, and C. Lu, “HOI analysis: Integrating and decomposing human-object interaction,” Advances in Neural Information Processing Systems, vol. 33, pp. 5011–5022, 2020.
  49. B. Kim, J. Lee, J. Kang, E.-S. Kim, and H. J. Kim, “HOTR: End-to-end human-object interaction detection with transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 74–83.
  50. B. Kim, J. Mun, K.-W. On, M. Shin, J. Lee, and E.-S. Kim, “Mstr: Multi-scale transformer for end-to-end human-object interaction detection,” in International Conference on Learning Representations, 2022, pp. 19 578–19 587.
  51. D. Zhou, Z. Liu, J. Wang, L. Wang, T. Hu, E. Ding, and J. Wang, “Human-object interaction detection via disentangled transformer,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19 568–19 577.
  52. S. Kim, D. Jung, and M. Cho, “Relational context learning for human-object interaction detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 2925–2934.
  53. F. Baldassarre, K. Smith, J. Sullivan, and H. Azizpour, “Explanation-based weakly-supervised learning of visual relations with graph networks,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVIII 16.   Springer, 2020, pp. 612–630.
  54. F. Z. Zhang, D. Campbell, and S. Gould, “Spatially conditioned graphs for detecting human-object interactions,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 13 319–13 327.
  55. Y. Zhang, Y. Pan, T. Yao, R. Huang, T. Mei, and C.-W. Chen, “Exploring structure-aware transformer over interaction proposals for human-object interaction detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 19 548–19 557.
Citations (1)

Summary

We haven't generated a summary for this paper yet.