Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 40 tok/s Pro
GPT-5 High 38 tok/s Pro
GPT-4o 103 tok/s Pro
Kimi K2 200 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Dual Prototype Attention for Unsupervised Video Object Segmentation (2211.12036v3)

Published 22 Nov 2022 in cs.CV

Abstract: Unsupervised video object segmentation (VOS) aims to detect and segment the most salient object in videos. The primary techniques used in unsupervised VOS are 1) the collaboration of appearance and motion information; and 2) temporal fusion between different frames. This paper proposes two novel prototype-based attention mechanisms, inter-modality attention (IMA) and inter-frame attention (IFA), to incorporate these techniques via dense propagation across different modalities and frames. IMA densely integrates context information from different modalities based on a mutual refinement. IFA injects global context of a video to the query frame, enabling a full utilization of useful properties from multiple frames. Experimental results on public benchmark datasets demonstrate that our proposed approach outperforms all existing methods by a substantial margin. The proposed two components are also thoroughly validated via ablative study.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (34)
  1. A naturalistic open source movie for optical flow evaluation. In European Conf. on Computer Vision (ECCV), pages 611–625. Springer-Verlag, 2012.
  2. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE transactions on pattern analysis and machine intelligence, 40(4):834–848, 2017.
  3. Treating motion as option to reduce motion dependency in unsupervised video object segmentation. arXiv preprint arXiv:2209.03138, 2022.
  4. Full-duplex strategy for video object segmentation. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4922–4933, 2021.
  5. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  6. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84–90, 2017.
  7. Unsupervised video object segmentation via prototype memory network. arXiv preprint arXiv:2209.03712, 2022a.
  8. Iteratively selecting an easy reference frame makes unsupervised video object segmentation easier. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 1245–1253, 2022b.
  9. F2net: Learning to focus on the foreground for unsupervised video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 2109–2117, 2021.
  10. Sgdr: Stochastic gradient descent with warm restarts. arXiv preprint arXiv:1608.03983, 2016.
  11. See more, know more: Unsupervised video object segmentation with co-attention siamese networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3623–3632, 2019.
  12. Making a case for 3d convolutions for object segmentation in videos. arXiv preprint arXiv:2008.11516, 2020.
  13. Segmentation of moving objects by long term video analysis. IEEE transactions on pattern analysis and machine intelligence, 36(6):1187–1200, 2013.
  14. Hierarchical feature alignment network for unsupervised video object segmentation. In European Conference on Computer Vision, pages 596–613. Springer, 2022.
  15. A benchmark dataset and evaluation methodology for video object segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 724–732, 2016.
  16. Learning object class detectors from weakly annotated video. In 2012 IEEE Conference on computer vision and pattern recognition, pages 3282–3289. IEEE, 2012.
  17. Reciprocal transformations for unsupervised video object segmentation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 15455–15464, 2021.
  18. D2conv3d: Dynamic dilated convolutions for object segmentation in videos. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 1200–1209, 2022.
  19. Video object segmentation using teacher-student adaptation in a human robot interaction (hri) setting. In 2019 International Conference on Robotics and Automation (ICRA), pages 50–56. IEEE, 2019.
  20. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  21. Pyramid dilated deeper convlstm for video salient object detection. In Proceedings of the European conference on computer vision (ECCV), pages 715–731, 2018.
  22. Unsupervised video object segmentation with online adversarial self-tuning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 688–698, 2023.
  23. Raft: Recurrent all-pairs field transforms for optical flow. In European conference on computer vision, pages 402–419. Springer, 2020.
  24. Learning to detect salient objects with image-level supervision. In CVPR, 2017.
  25. Zero-shot video object segmentation via attentive graph neural networks. In Proceedings of the IEEE/CVF international conference on computer vision, pages 9236–9245, 2019a.
  26. Learning unsupervised video object segmentation through visual attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3064–3074, 2019b.
  27. Youtube-vos: A large-scale video object segmentation benchmark. arXiv preprint arXiv:1809.03327, 2018.
  28. Learning motion-appearance co-attention for zero-shot video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1564–1573, 2021.
  29. Anchor diffusion for unsupervised video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 931–940, 2019.
  30. Object-contextual representations for semantic segmentation. In European conference on computer vision, pages 173–190. Springer, 2020.
  31. Deep transport network for unsupervised video object segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 8781–8790, 2021.
  32. Unsupervised video object segmentation with joint hotspot tracking. In European Conference on Computer Vision, pages 490–506. Springer, 2020.
  33. Learning discriminative feature with crf for unsupervised video object segmentation. In European Conference on Computer Vision, pages 445–462. Springer, 2020.
  34. Motion-attentive transition for zero-shot video object segmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 13066–13073, 2020.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube