Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

D$^2$ST-Adapter: Disentangled-and-Deformable Spatio-Temporal Adapter for Few-shot Action Recognition (2312.01431v4)

Published 3 Dec 2023 in cs.CV

Abstract: Adapting pre-trained image models to video modality has proven to be an effective strategy for robust few-shot action recognition. In this work, we explore the potential of adapter tuning in image-to-video model adaptation and propose a novel video adapter tuning framework, called Disentangled-and-Deformable Spatio-Temporal Adapter (D$2$ST-Adapter). It features a lightweight design, low adaptation overhead and powerful spatio-temporal feature adaptation capabilities. D$2$ST-Adapter is structured with an internal dual-pathway architecture that enables built-in disentangled encoding of spatial and temporal features within the adapter, seamlessly integrating into the single-stream feature learning framework of pre-trained image models. In particular, we develop an efficient yet effective implementation of the D$2$ST-Adapter, incorporating the specially devised anisotropic Deformable Spatio-Temporal Attention as its pivotal operation. This mechanism can be individually tailored for two pathways with anisotropic sampling densities along the spatial and temporal domains in 3D spatio-temporal space, enabling disentangled encoding of spatial and temporal features while maintaining a lightweight design. Extensive experiments by instantiating our method on both pre-trained ResNet and ViT demonstrate the superiority of our method over state-of-the-art methods. Our method is particularly well-suited to challenging scenarios where temporal dynamics are critical for action recognition. Code is available at https://github.com/qizhongtan/D2ST-Adapter.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. TARN: Temporal attentive relation network for few-shot and zero-shot action recognition. In BMVC, 2019.
  2. Few-shot video classification via temporal alignment. In CVPR, pages 10618–10627, 2020.
  3. Quo vadis, action recognition? a new model and the kinetics dataset. In CVPR, pages 6299–6308, 2017.
  4. Adaptformer: Adapting vision transformers for scalable visual recognition. NeurIPS, 35:16664–16678, 2022.
  5. ImageNet: A large-scale hierarchical image database. In CVPR, pages 248–255, 2009.
  6. Crosstransformers: spatially-aware few-shot transfer. NeurIPS, 33:21981–21993, 2020.
  7. An image is worth 16x16 words: Transformers for image recognition at scale. In ICLR, 2021.
  8. SlowFast networks for video recognition. In ICCV, pages 6202–6211, 2019.
  9. Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pages 1126–1135, 2017.
  10. The" something something" video database for learning and evaluating visual common sense. In ICCV, pages 5842–5850, 2017.
  11. Low-shot visual recognition by shrinking and hallucinating features. In ICCV, pages 3018–3027, 2017.
  12. Towards a unified view of parameter-efficient transfer learning. In ICLR, 2021.
  13. Deep residual learning for image recognition. In CVPR, pages 770–778, 2016.
  14. Parameter-efficient transfer learning for nlp. In ICML, pages 2790–2799, 2019.
  15. Compound prototype matching for few-shot action recognition. In ECCV, pages 351–368, 2022.
  16. Task agnostic meta-learning for few-shot learning. In CVPR, pages 11719–11727, 2019.
  17. Visual prompt tuning. In ECCV, pages 709–727, 2022.
  18. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  19. HMDB: a large video database for human motion recognition. In ICCV, pages 2556–2563, 2011.
  20. Adversarial feature hallucination networks for few-shot learning. In CVPR, pages 13470–13479, 2020.
  21. Uniformer: Unified transformer for efficient spatiotemporal representation learning. In ICLR, 2022a.
  22. TA2N: Two-stage action alignment network for few-shot action recognition. In AAAI, pages 1404–1411, 2022b.
  23. Meinard Müller. Dynamic time warping. Information Retrieval for Music and Motion, pages 69–84, 2007.
  24. Inductive and transductive few-shot video classification via appearance and temporal alignments. In ECCV, pages 471–487, 2022.
  25. ST-Adapter: Parameter-efficient image-to-video transfer learning. In NeurIPS, 2022.
  26. Temporal-relational CrossTransformers for few-shot action recognition. In CVPR, pages 475–484, 2021.
  27. Learning transferable visual models from natural language supervision. In ICML, pages 8748–8763, 2021.
  28. Optimization as a model for few-shot learning. In ICLR, 2017.
  29. Two-stream convolutional networks for action recognition in videos. NeurIPS, 27, 2014.
  30. Prototypical networks for few-shot learning. NeurIPS, 30, 2017.
  31. UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
  32. Learning to compare: Relation network for few-shot learning. In CVPR, pages 1199–1208, 2018.
  33. Lst: Ladder side-tuning for parameter and memory efficient transfer learning. NeurIPS, 35:12991–13005, 2022.
  34. Spatio-temporal relation modeling for few-shot action recognition. In CVPR, pages 19958–19967, 2022.
  35. Learning spatiotemporal features with 3d convolutional networks. In ICCV, pages 4489–4497, 2015.
  36. A closer look at spatiotemporal convolutions for action recognition. In CVPR, pages 6450–6459, 2018.
  37. Matching networks for one shot learning. NeurIPS, 29, 2016.
  38. Temporal segment networks: Towards good practices for deep action recognition. In ECCV, pages 20–36, 2016.
  39. K-adapter: Infusing knowledge into pre-trained models with adapters. In ACL, pages 1405–1418, 2021.
  40. Hybrid relation guided set matching for few-shot action recognition. In CVPR, pages 19948–19957, 2022.
  41. Clip-guided prototype modulating for few-shot action recognition. arXiv preprint arXiv:2303.02982, 2023a.
  42. Molo: Motion-augmented long-short contrastive learning for few-shot action recognition. In CVPR, pages 18011–18021, 2023b.
  43. Low-shot learning from imaginary data. In CVPR, pages 7278–7286, 2018.
  44. Motion-modulated temporal fragment alignment network for few-shot action recognition. In CVPR, pages 9151–9160, 2022.
  45. Few-shot video classification via representation fusion and promotion learning. In ICCV, pages 19311–19320, 2023.
  46. Vision transformer with deformable attention. In CVPR, pages 4794–4803, 2022.
  47. Revisiting the spatial and temporal modeling for few-shot action recognition. In AAAI, pages 3001–3009, 2023a.
  48. Boosting few-shot action recognition with graph-guided hybrid matching. In ICCV, pages 1740–1750, 2023b.
  49. AIM: Adapting image models for efficient video action recognition. In ICLR, 2022.
  50. Few-shot learning via embedding adaptation with set-to-set functions. In CVPR, pages 8808–8817, 2020.
  51. TAPNet: Neural network augmented with task-adaptive projection for few-shot learning. In ICML, pages 7115–7123, 2019.
  52. Few-shot action recognition with permutation-invariant attention. In ECCV, pages 525–542, 2020.
  53. MetaGAN: An adversarial approach to few-shot learning. NeurIPS, 31, 2018.
  54. Learning implicit temporal alignment for few-shot video classification. In IJCAI, 2021.
  55. Few-shot action recognition with hierarchical matching and contrastive learning. In ECCV, pages 297–313, 2022.
  56. Learning to prompt for vision-language models. IJCV, 130(9):2337–2348, 2022.
  57. Compound memory networks for few-shot video classification. In ECCV, pages 751–766, 2018.
  58. Label independent memory for semi-supervised few-shot video classification. IEEE TPAMI, 44(1):273–285, 2020.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.