Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Register assisted aggregation for Visual Place Recognition (2405.11526v1)

Published 19 May 2024 in cs.CV

Abstract: Visual Place Recognition (VPR) refers to the process of using computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query images and database images for retrieval, these differences increase the difficulty of place recognition. Previous methods often discarded useless features (such as sky, road, vehicles) while uncontrolled discarding features that help improve recognition accuracy (such as buildings, trees). To preserve these useful features, we propose a new feature aggregation method to address this issue. Specifically, in order to obtain global and local features that contain discriminative place information, we added some registers on top of the original image tokens to assist in model training. After reallocating attention weights, these registers were discarded. The experimental results show that these registers surprisingly separate unstable features from the original image representation and outperform state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Probabilistic visual place recognition for hierarchical localization, IEEE Robotics and Automation Letters 6 (2020) 311–318.
  2. Scalable 6-dof localization on mobile devices, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13, Springer, 2014, pp. 268–283.
  3. Scalable place recognition under appearance change for autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9319–9328.
  4. Stochastic attraction-repulsion embedding for large scale image localization, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019a, pp. 2570–2579.
  5. Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019b, pp. 2831–2840.
  6. Aggregating local descriptors into a compact image representation, in: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, 2010, pp. 3304–3311.
  7. Aggregating local image descriptors into compact codes, IEEE transactions on pattern analysis and machine intelligence 34 (2011) 1704–1716.
  8. Predicting good features for image geo-localization using per-bundle vlad, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1170–1178.
  9. Attention is all you need, Advances in neural information processing systems 30 (2017).
  10. Transvpr: Transformer-based place recognition with multi-level attention aggregation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13648–13657.
  11. R2former: Unified retrieval and reranking transformer for place recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19370–19380.
  12. S. Izquierdo, J. Civera, Optimal transport aggregation for visual place recognition, arXiv preprint arXiv:2311.15937 (2023).
  13. Towards seamless adaptation of pre-trained models for visual place recognition, arXiv preprint arXiv:2402.14505 (2024).
  14. Netvlad: Cnn architecture for weakly supervised place recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
  15. Vision transformers need registers, arXiv preprint arXiv:2309.16588 (2023).
  16. Dinov2: Learning robust visual features without supervision, arXiv preprint arXiv:2304.07193 (2023).
  17. Object retrieval with large vocabularies and fast spatial matching, in: 2007 IEEE conference on computer vision and pattern recognition, IEEE, 2007, pp. 1–8.
  18. Mixvpr: Feature mixing for visual place recognition, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2998–3007.
  19. Anyloc: Towards universal visual place recognition, IEEE Robotics and Automation Letters (2023).
  20. Fine-tuning cnn image retrieval with no human annotation, IEEE transactions on pattern analysis and machine intelligence 41 (2018) 1655–1668.
  21. Rethinking visual geo-localization for large-scale applications, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4878–4888.
  22. M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in neural information processing systems 26 (2013).
  23. Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
  24. Adaptive computation with elastic input sequence, in: International Conference on Machine Learning, PMLR, 2023, pp. 38971–38988.
  25. An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
  26. Cricavpr: Cross-image correlation-aware representation learning for visual place recognition, arXiv preprint arXiv:2402.19231 (2024).
  27. Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
  28. Gsv-cities: Toward appropriate supervised visual place recognition, Neurocomputing 513 (2022) 194–203.
  29. Visual place recognition with repetitive structures, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 883–890.
  30. Mapillary street-level sequences: A dataset for lifelong place recognition, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2626–2635.
  31. Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change, International Journal of Computer Vision 129 (2021) 2136–2174.
  32. Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019).
  33. Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012).
  34. Multi-similarity loss with general pair weighting for deep metric learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5022–5030.
  35. I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101 (2017).
  36. Eigenplaces: Training viewpoint robust models for visual place recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11080–11090.

Summary

We haven't generated a summary for this paper yet.