Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What to Learn: Features, Image Transformations, or Both? (2306.13040v1)

Published 22 Jun 2023 in cs.RO

Abstract: Long-term visual localization is an essential problem in robotics and computer vision, but remains challenging due to the environmental appearance changes caused by lighting and seasons. While many existing works have attempted to solve it by directly learning invariant sparse keypoints and descriptors to match scenes, these approaches still struggle with adverse appearance changes. Recent developments in image transformations such as neural style transfer have emerged as an alternative to address such appearance gaps. In this work, we propose to combine an image transformation network and a feature-learning network to improve long-term localization performance. Given night-to-day image pairs, the image transformation network transforms the night images into day-like conditions prior to feature matching; the feature network learns to detect keypoint locations with their associated descriptor values, which can be passed to a classical pose estimator to compute the relative poses. We conducted various experiments to examine the effectiveness of combining style transfer and feature learning and its training strategy, showing that such a combination greatly improves long-term localization performance.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, 2004. [Online]. Available: http://www.cs.ubc.ca/ lowe/papers/ijcv04.pdf
  2. H. Bay, T. Tuytelaars, and L. Van Gool, “Surf: Speeded up robust features,” vol. 3951, 07 2006, pp. 404–417.
  3. E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, “Orb: an efficient alternative to sift or surf,” 11 2011, pp. 2564–2571.
  4. W. Churchill and P. Newman, “Experience-based navigation for long-term localisation,” International Journal of Robotics Research, vol. 32, pp. 1645–1661, 12 2013.
  5. P. Furgale and T. Barfoot, “Visual teach and repeat for long-range rover autonomy,” J. Field Robotics, vol. 27, pp. 534–560, 09 2010.
  6. M. Paton, K. MacTavish, M. Warren, and T. D. Barfoot, “Bridging the appearance gap: Multi-experience localization for long-term visual teach and repeat,” in 2016 IEEE/RSJ IROS, 2016, pp. 1918–1925.
  7. V. Balntas, S. Li, and V. Prisacariu, “Relocnet: Continuous metric learning relocalisation using neural nets,” in The European Conference on Computer Vision (ECCV), September 2018.
  8. M. Ding, Z. Wang, J. Sun, J. Shi, and P. Luo, “Camnet: Coarse-to-fine retrieval for camera re-localization,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2871–2880.
  9. Z. Laskar, I. Melekhov, S. Kalia, and J. Kannala, “Camera relocalization by computing pairwise relative poses using convolutional neural network,” 2017.
  10. A. Kendall, M. Grimes, and R. Cipolla, “Posenet: A convolutional network for real-time 6-dof camera relocalization,” in Proceedings of IEEE ICCV, 2015, pp. 2938–2946.
  11. M. Gridseth and T. D. Barfoot, “Keeping an eye on things: Deep learned features for long-term visual localization,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 1016–1023, 2022.
  12. Y. Chen and T. D. Barfoot, “Self-supervised feature learning for long-term metric visual localization,” 2022. [Online]. Available: https://arxiv.org/abs/2212.00122
  13. M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler, “D2-net: A trainable cnn for joint description and detection of local features,” pp. 8092–8101, 2019.
  14. Y. Verdie, K. Yi, P. Fua, and V. Lepetit, “Tilde: A temporally invariant learned detector,” in Proceedings of IEEE/CVF CVPR, 2015, pp. 5279–5288.
  15. D. DeTone, T. Malisiewicz, and A. Rabinovich, “Superpoint: Self-supervised interest point detection and description,” in Proceedings of IEEE/CVF CVPR Workshops, 2018, pp. 224–236.
  16. K. Yi, E. Trulls, V. Lepetit, and P. Fua, “LIFT: Learned invariant feature transform,” in Proceedings of ECCV, 2016.
  17. Y. Ono, E. Trulls, P. Fua, and K. M. Yi, “Lf-net: learning local features from images,” in Advances in neural information processing systems, 2018, pp. 6234–6244.
  18. L. von Stumberg, P. Wenzel, Q. Khan, and D. Cremers, “Gn-net: The gauss-newton loss for multi-weather relocalization,” IEEE RAL, vol. 5, no. 2, pp. 890–897, 2020.
  19. J. Revaud, P. Weinzaepfel, C. De Souza, N. Pion, G. Csurka, Y. Cabon, and M. Humenberger, “R2d2: Repeatable and reliable detector and descriptor,” arXiv preprint arXiv:1906.06195, 2019.
  20. L. Sun, M. Taher, C. Wild, C. Zhao, Y. Zhang, F. Majer, Z. Yan, T. Krajník, T. Prescott, and T. Duckett, “Robust and long-term monocular teach and repeat navigation using a single-experience map,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2021, pp. 2635–2642.
  21. M. Kasper, F. Nobre, C. Heckman, and N. Keivan, “Unsupervised metric relocalization using transform consistency loss,” in Conference on Robot Learning, 2020.
  22. H. Porav, W. P. Maddern, and P. Newman, “Adversarial training for adverse conditions: Robust metric localisation using appearance transfer,” 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 1011–1018, 2018.
  23. L. Clement, M. Gridseth, J. Tomasi, and J. Kelly, “Learning matchable image transformations for long-term metric visual localization,” IEEE Robotics and Automation Letters, vol. 5, no. 2, pp. 1492–1499, apr 2020. [Online]. Available: https://doi.org/10.1109%2Flra.2020.2967659
  24. B. Xu, A. Davison, and S. Leutenegger, “Deep probabilistic feature-metric tracking,” IEEE Robotics and Automation Letters, vol. 6, no. 1, pp. 223 – 230, 2021.
  25. A. Uzpak, A. Djelouah, and S. Schaub-Meyer, “Style transfer for keypoint matching under adverse conditions,” ser. 2020 International Conference on 3D Vision (3DV), 2020, pp. 1089–1097.
  26. L. Shi, M. Wang, Y. Yue, and Y. Yang, “Absolute anchor-negative distance based metric learning for day-night feature matching,” in 2021 IEEE International Conference on Unmanned Systems (ICUS), 2021, pp. 1045–1050.
  27. C. Valgren and A. J. Lilienthal, “Sift, surf & seasons: Appearance-based long-term localization in outdoor environments,” Robotics and Autonomous Systems, vol. 58, no. 2, pp. 149–156, 2010, selected papers from the 2007 European Conference on Mobile Robots (ECMR ’07). [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0921889009001493
  28. C. Linegar, W. Churchill, and P. Newman, “Work smart, not hard: Recalling relevant experiences for vast-scale but time-constrained localisation,” in 2015 IEEE International Conference on Robotics and Automation (ICRA), 2015, pp. 90–97.
  29. T. Sattler, Q. Zhou, M. Pollefeys, and L. Leal-Taixe, “Understanding the limitations of cnn-based absolute camera pose regression,” in Proceedings of IEEE/CVF CVPR, 2019, pp. 3302–3312.
  30. D. Barnes and I. Posner, “Under the radar: Learning to predict robust keypoints for odometry estimation and metric localisation in radar,” in Proceedings of IEEE ICRA, 2020.
  31. A. Anoosheh, T. Sattler, R. Timofte, M. Pollefeys, and L. Van Gool, “Night-to-day image translation for retrieval-based localization,” 2018.
  32. L. A. Gatys, A. S. Ecker, and M. Bethge, “Image style transfer using convolutional neural networks,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2414–2423.
  33. J. Johnson, A. Alahi, and L. Fei-Fei, “Perceptual losses for real-time style transfer and super-resolution,” 2016. [Online]. Available: https://arxiv.org/abs/1603.08155
  34. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv 1409.1556, 09 2014.
  35. J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
Citations (2)

Summary

We haven't generated a summary for this paper yet.