Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 62 tok/s
Gemini 2.5 Pro 48 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 13 tok/s Pro
GPT-4o 93 tok/s Pro
Kimi K2 213 tok/s Pro
GPT OSS 120B 458 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

Register assisted aggregation for Visual Place Recognition (2405.11526v1)

Published 19 May 2024 in cs.CV

Abstract: Visual Place Recognition (VPR) refers to the process of using computer vision to recognize the position of the current query image. Due to the significant changes in appearance caused by season, lighting, and time spans between query images and database images for retrieval, these differences increase the difficulty of place recognition. Previous methods often discarded useless features (such as sky, road, vehicles) while uncontrolled discarding features that help improve recognition accuracy (such as buildings, trees). To preserve these useful features, we propose a new feature aggregation method to address this issue. Specifically, in order to obtain global and local features that contain discriminative place information, we added some registers on top of the original image tokens to assist in model training. After reallocating attention weights, these registers were discarded. The experimental results show that these registers surprisingly separate unstable features from the original image representation and outperform state-of-the-art methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. Probabilistic visual place recognition for hierarchical localization, IEEE Robotics and Automation Letters 6 (2020) 311–318.
  2. Scalable 6-dof localization on mobile devices, in: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13, Springer, 2014, pp. 268–283.
  3. Scalable place recognition under appearance change for autonomous driving, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019, pp. 9319–9328.
  4. Stochastic attraction-repulsion embedding for large scale image localization, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019a, pp. 2570–2579.
  5. Lpd-net: 3d point cloud learning for large-scale place recognition and environment analysis, in: Proceedings of the IEEE/CVF international conference on computer vision, 2019b, pp. 2831–2840.
  6. Aggregating local descriptors into a compact image representation, in: 2010 IEEE computer society conference on computer vision and pattern recognition, IEEE, 2010, pp. 3304–3311.
  7. Aggregating local image descriptors into compact codes, IEEE transactions on pattern analysis and machine intelligence 34 (2011) 1704–1716.
  8. Predicting good features for image geo-localization using per-bundle vlad, in: Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 1170–1178.
  9. Attention is all you need, Advances in neural information processing systems 30 (2017).
  10. Transvpr: Transformer-based place recognition with multi-level attention aggregation, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 13648–13657.
  11. R2former: Unified retrieval and reranking transformer for place recognition, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19370–19380.
  12. S. Izquierdo, J. Civera, Optimal transport aggregation for visual place recognition, arXiv preprint arXiv:2311.15937 (2023).
  13. Towards seamless adaptation of pre-trained models for visual place recognition, arXiv preprint arXiv:2402.14505 (2024).
  14. Netvlad: Cnn architecture for weakly supervised place recognition, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 5297–5307.
  15. Vision transformers need registers, arXiv preprint arXiv:2309.16588 (2023).
  16. Dinov2: Learning robust visual features without supervision, arXiv preprint arXiv:2304.07193 (2023).
  17. Object retrieval with large vocabularies and fast spatial matching, in: 2007 IEEE conference on computer vision and pattern recognition, IEEE, 2007, pp. 1–8.
  18. Mixvpr: Feature mixing for visual place recognition, in: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 2998–3007.
  19. Anyloc: Towards universal visual place recognition, IEEE Robotics and Automation Letters (2023).
  20. Fine-tuning cnn image retrieval with no human annotation, IEEE transactions on pattern analysis and machine intelligence 41 (2018) 1655–1668.
  21. Rethinking visual geo-localization for large-scale applications, in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4878–4888.
  22. M. Cuturi, Sinkhorn distances: Lightspeed computation of optimal transport, Advances in neural information processing systems 26 (2013).
  23. Bert: Pre-training of deep bidirectional transformers for language understanding, arXiv preprint arXiv:1810.04805 (2018).
  24. Adaptive computation with elastic input sequence, in: International Conference on Machine Learning, PMLR, 2023, pp. 38971–38988.
  25. An image is worth 16x16 words: Transformers for image recognition at scale, arXiv preprint arXiv:2010.11929 (2020).
  26. Cricavpr: Cross-image correlation-aware representation learning for visual place recognition, arXiv preprint arXiv:2402.19231 (2024).
  27. Going deeper with convolutions, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 1–9.
  28. Gsv-cities: Toward appropriate supervised visual place recognition, Neurocomputing 513 (2022) 194–203.
  29. Visual place recognition with repetitive structures, in: Proceedings of the IEEE conference on computer vision and pattern recognition, 2013, pp. 883–890.
  30. Mapillary street-level sequences: A dataset for lifelong place recognition, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 2626–2635.
  31. Vpr-bench: An open-source visual place recognition evaluation framework with quantifiable viewpoint and appearance change, International Journal of Computer Vision 129 (2021) 2136–2174.
  32. Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019).
  33. Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems 25 (2012).
  34. Multi-similarity loss with general pair weighting for deep metric learning, in: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 5022–5030.
  35. I. Loshchilov, F. Hutter, Decoupled weight decay regularization, arXiv preprint arXiv:1711.05101 (2017).
  36. Eigenplaces: Training viewpoint robust models for visual place recognition, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11080–11090.

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (2)

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube