Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
194 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

EventDance: Unsupervised Source-free Cross-modal Adaptation for Event-based Object Recognition (2403.14082v1)

Published 21 Mar 2024 in cs.CV

Abstract: In this paper, we make the first attempt at achieving the cross-modal (i.e., image-to-events) adaptation for event-based object recognition without accessing any labeled source image data owning to privacy and commercial issues. Tackling this novel problem is non-trivial due to the novelty of event cameras and the distinct modality gap between images and events. In particular, as only the source model is available, a hurdle is how to extract the knowledge from the source model by only using the unlabeled target event data while achieving knowledge transfer. To this end, we propose a novel framework, dubbed EventDance for this unsupervised source-free cross-modal adaptation problem. Importantly, inspired by event-to-video reconstruction methods, we propose a reconstruction-based modality bridging (RMB) module, which reconstructs intensity frames from events in a self-supervised manner. This makes it possible to build up the surrogate images to extract the knowledge (i.e., labels) from the source model. We then propose a multi-representation knowledge adaptation (MKA) module that transfers the knowledge to target models learning events with multiple representation types for fully exploring the spatiotemporal information of events. The two modules connecting the source and target models are mutually updated so as to achieve the best performance. Experiments on three benchmark datasets with two adaption settings show that EventDance is on par with prior methods utilizing the source data.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (78)
  1. Camera on-boarding for person re-identification using hypothesis transfer learning. In CVPR, pages 12141–12150. Computer Vision Foundation / IEEE, 2020.
  2. Unsupervised multi-source domain adaptation without access to source data. In CVPR, pages 10103–10112, 2021.
  3. Cross-modal knowledge transfer without task-relevant source data. In ECCV, pages 111–127. Springer, 2022.
  4. Real-time high speed motion prediction using fast aperture-robust event-driven visual flow. IEEE Trans. Pattern Anal. Mach. Intell., 44(1):361–372, 2022.
  5. Distance surface for event-based optical flow. IEEE transactions on pattern analysis and machine intelligence, 42(7):1547–1556, 2020.
  6. Time-ordered recent event (tore) volumes for event cameras. IEEE TPAMI, 2022.
  7. A differentiable recurrent surface for asynchronous event-based data. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16, pages 136–152. Springer, 2020.
  8. Chasing day and night: Towards robust and efficient all-day object detection guided by an event camera. arXiv preprint arXiv:2309.09297, 2023.
  9. Domain-specific batch normalization for unsupervised domain adaptation. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019, pages 7354–7362. Computer Vision Foundation / IEEE, 2019.
  10. Label-free event-based object recognition via joint learning with image reconstruction from events. arXiv preprint arXiv:2308.09383, 2023.
  11. Learning an augmented RGB representation with cross-modal knowledge distillation for action detection. In ICCV, pages 13033–13044. IEEE, 2021.
  12. Amae: Adaptive motion-agnostic encoder for event-based object classification. IEEE Robotics and Automation Letters, 5(3):4596–4603, 2020.
  13. Mvf-net: A multi-view fusion network for event-based object classification. IEEE Transactions on Circuits and Systems for Video Technology, 32(12):8275–8284, 2021.
  14. A voxel graph cnn for object classification with event cameras. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1172–1181, 2022.
  15. Source-free domain adaptation via distribution estimation. In CVPR, pages 7202–7212. IEEE, 2022.
  16. Translate-to-recognize networks for RGB-D scene recognition. In CVPR, pages 11836–11845. Computer Vision Foundation / IEEE, 2019.
  17. Source-free unsupervised domain adaptation: A survey. CoRR, abs/2301.00265, 2023.
  18. Translate to adapt: RGB-D scene recognition across domains. CoRR, abs/2103.14672, 2021.
  19. Event-based camera pose tracking using a generative event model. arXiv preprint arXiv:1510.01972, 2015.
  20. Event-based vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell., 44(1):154–180, 2022.
  21. End-to-end learning of representations for asynchronous event-based data. In ICCV, pages 5633–5643, 2019a.
  22. End-to-end learning of representations for asynchronous event-based data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5633–5643, 2019b.
  23. Cross modal distillation for supervision transfer. In CVPR, pages 2827–2836, 2016.
  24. Source-free unsupervised domain adaptation with surrogate data generation. In BMVC, page 198. BMVA Press, 2021.
  25. Cross-modal adaptation for RGB-D detection. In ICRA, pages 5032–5039. IEEE, 2016.
  26. Category contrast for unsupervised domain adaptation in visual tasks. In CVPR, pages 1193–1204. IEEE, 2022a.
  27. Lifelong unsupervised domain adaptive person re-identification with coordinated anti-forgetting and adaptation. In CVPR, pages 14268–14277. IEEE, 2022b.
  28. Hots: a hierarchy of event-based time-surfaces for pattern recognition. IEEE TPAMI, 39(7):1346–1359, 2016.
  29. Training deep spiking neural networks using backpropagation. Frontiers in neuroscience, 10:508, 2016.
  30. UDA-COPE: unsupervised domain adaptation for category-level object pose estimation. In CVPR, pages 14871–14880. IEEE, 2022.
  31. Cifar10-dvs: an event-stream dataset for object classification. Frontiers in neuroscience, 11:309, 2017.
  32. Asynchronous spatio-temporal memory network for continuous event-based object detection. IEEE Trans. Image Process., 31:2975–2987, 2022.
  33. Do we really need to access the source data? source hypothesis transfer for unsupervised domain adaptation. In ICML, pages 6028–6039. PMLR, 2020.
  34. DINE: domain adaptation from single and multiple black-box predictors. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 7993–8003. IEEE, 2022a.
  35. Source data-absent unsupervised domain adaptation through hypothesis transfer and labeling transfer. IEEE Trans. Pattern Anal. Mach. Intell., 44(11):8602–8617, 2022b.
  36. Learning part segmentation through unsupervised domain adaptation from synthetic vehicles. In CVPR, pages 19118–19129. IEEE, 2022.
  37. A source-free domain adaptive polyp detection framework with style diversification flow. IEEE Trans. Medical Imaging, 41(7):1897–1908, 2022.
  38. Speed invariant time surface for learning to detect corner points with event-based cameras. In CVPR, pages 10245–10254, 2019.
  39. Event-based vision meets deep learning on steering prediction for self-driving cars. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5419–5427, 2018.
  40. Bridging the gap between events and frames through unsupervised domain adaptation. IEEE Robotics and Automation Letters, 7(2):3515–3522, 2022.
  41. The norm must go on: Dynamic unsupervised domain adaptation by normalization. In CVPR, pages 14745–14755. IEEE, 2022.
  42. Phased lstm: Accelerating recurrent network training for long or event-based sequences. Advances in neural information processing systems, 29, 2016.
  43. Converting static image datasets to spiking neuromorphic datasets using saccades. CoRR, abs/1507.07629, 2015.
  44. Federico Paredes-Vallés and Guido CHE de Croon. Back to event basics: Self-supervised learning of image reconstruction for event cameras via photometric constancy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 3446–3455, 2021.
  45. Domain adaptive semantic segmentation using weak labels. In ECCV, pages 571–587. Springer, 2020.
  46. A theoretical analysis of metric hypothesis transfer learning. In Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, pages 1708–1717. JMLR.org, 2015.
  47. Retinomorphic event-based vision sensors: Bioinspired cameras with spiking output. Proc. IEEE, 102(10):1470–1484, 2014.
  48. High speed and high dynamic range video with an event camera. IEEE Trans. Pattern Anal. Mach. Intell., 43(6):1964–1980, 2021.
  49. Multi-source unsupervised domain adaptation via pseudo target domain. IEEE TIP, 31:2122–2135, 2022.
  50. Fast image reconstruction with an event camera. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 156–163, 2020.
  51. Unsupervised model adaptation for continual semantic segmentation. In Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021, pages 2593–2601. AAAI Press, 2021.
  52. Cross-modal knowledge distillation for action recognition. In 2019 IEEE ICIP, Taipei, Taiwan, September 22-25, 2019, pages 6–10. IEEE, 2019.
  53. VDM-DA: virtual domain modeling for source data-free domain adaptation. IEEE Trans. Circuits Syst. Video Technol., 32(6):3749–3760, 2022.
  54. Laurens Van der Maaten and Geoffrey Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008.
  55. Ultimate slam? combining events, images, and IMU for robust visual SLAM in HDR and high-speed scenarios. IEEE Robotics Autom. Lett., 3(2):994–1001, 2018.
  56. Event-based high dynamic range image and very high frame rate video generation using conditional generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10081–10090, 2019a.
  57. Evdistill: Asynchronous events to end-task learning via bidirectional reconstruction-guided cross-modal knowledge distillation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 608–619, 2021.
  58. Space-time event clouds for gesture recognition: From rgb cameras to event cameras. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1826–1835. IEEE, 2019b.
  59. Ev-gait: Event-based robust gait recognition using dynamic vision sensors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6358–6367, 2019c.
  60. Model-induced generalization error bound for information-theoretic representation learning in source-data-free unsupervised domain adaptation. IEEE Trans. Image Process., 31:419–432, 2022a.
  61. Mutualnet: Adaptive convnet via mutual learning from network width and resolution. In ECCV, pages 299–315. Springer, 2020.
  62. Heterogeneous graph attention network for unsupervised multiple-target domain adaptation. IEEE TPAMI, 44(4):1992–2003, 2022b.
  63. Unsupervised learning of dense optical flow, depth and egomotion with event-based sensors. In IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020, Las Vegas, NV, USA, October 24, 2020 - January 24, 2021, pages 5831–5838. IEEE, 2020.
  64. Unsupervised domain adaptation for nighttime aerial tracking. In CVPR, pages 8886–8895. IEEE, 2022.
  65. Unsupervised domain adaptation with content-wise alignment for hyperspectral imagery classification. IEEE Geosci. Remote. Sens. Lett., 19:1–5, 2022.
  66. Spectral unsupervised domain adaptation for visual recognition. In CVPR, pages 9819–9830. IEEE, 2022.
  67. Discriminative joint probability maximum mean discrepancy (DJP-MMD) for domain adaptation. In IJCNN, pages 1–8. IEEE, 2020.
  68. Transformer-based domain adaptation for event data classification. In IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2022, Virtual and Singapore, 23-27 May 2022, pages 4673–4677. IEEE, 2022.
  69. Knowledge as priors: Cross-modal knowledge generalization for datasets without superior knowledge. In CVPR, pages 6527–6536. Computer Vision Foundation / IEEE, 2020.
  70. Deep learning for event-based vision: A comprehensive survey and benchmarks, 2023a.
  71. Look at the neighbor: Distortion-aware unsupervised domain adaptation for panoramic semantic segmentation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 18687–18698, 2023b.
  72. Both style and distortion matter: Dual-path unsupervised domain adaptation for panoramic semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1285–1295, 2023c.
  73. Semantics, distortion, and style matter: Towards source-free uda for panoramic segmentation, 2024.
  74. Eventbind: Learning a unified representation to bind them all for event-based open-world understanding, 2024.
  75. Ev-flownet: Self-supervised optical flow estimation for event-based cameras. arXiv preprint arXiv:1802.06898, 2018.
  76. Unsupervised event-based learning of optical flow, depth, and egomotion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 989–997, 2019.
  77. Deep subdomain adaptation network for image classification. IEEE Transactions on Neural Networks and Learning Systems, 32(4):1713–1722, 2021.
  78. Learning to reconstruct high speed and high dynamic range videos from events. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021, pages 2024–2033. Computer Vision Foundation / IEEE, 2021.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com