RL-LOGO: Deep Reinforcement Learning Localization for Logo Recognition (2312.16792v1)
Abstract: This paper proposes a novel logo image recognition approach incorporating a localization technique based on reinforcement learning. Logo recognition is an image classification task identifying a brand in an image. As the size and position of a logo vary widely from image to image, it is necessary to determine its position for accurate recognition. However, because there is no annotation for the position coordinates, it is impossible to train and infer the location of the logo in the image. Therefore, we propose a deep reinforcement learning localization method for logo recognition (RL-LOGO). It utilizes deep reinforcement learning to identify a logo region in images without annotations of the positions, thereby improving classification accuracy. We demonstrated a significant improvement in accuracy compared with existing methods in several published benchmarks. Specifically, we achieved an 18-point accuracy improvement over competitive methods on the complex dataset Logo-2K+. This demonstrates that the proposed method is a promising approach to logo recognition in real-world applications.
- “Logo-2k+: A large-scale logo dataset for scalable logo classification,” in Proceedings of AAAI Conference on Artificial Intelligence, 2020, vol. 34, pp. 6194–6201.
- Masato Fujitake, “A3s: Adversarial learning of semantic representations for scene-text spotting,” in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2023, pp. 1–5.
- “Temporally-aware convolutional block attention module for video text detection,” in IEEE SMC, 2021, pp. 220–225.
- “Imagenet: A large-scale hierarchical image database,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
- “Learning to navigate for fine-grained classification,” in Proceedings of European Conference on Computer Vision, 2018, pp. 420–435.
- “Active object localization with deep reinforcement learning,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 2488–2496.
- “Bag of tricks for image classification with convolutional neural networks,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 558–567.
- “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
- “Deep residual learning for image recognition,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
- “Deep learning for logo detection: A survey,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 7 2023.
- “Logodet-3k: A large-scale image dataset for logo detection,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 18, no. 1, pp. 1–19, 2022.
- “Scalable logo recognition in real-world images,” in Proceedings of ACM International Conference on Multimedia Retrieval, 2011, pp. 1–8.
- “Scalable logo recognition using proxies,” in Proceedings of IEEE Winter Conference on Applications of Computer Vision, 2019, pp. 715–725.
- Masato Fujitake, “Diffusionstr: Diffusion model for scene text recognition,” in Proceedings of IEEE International Conference on Image Processing, 2023, pp. 1585–1589.
- Masato Fujitake, “Dtrocr: Decoder-only transformer for optical character recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 8025–8035.
- “Which and how many regions to gaze: Focus discriminative regions for fine-grained visual categorization,” International Journal of Computer Vision, vol. 127, pp. 1235–1255, 2019.
- “Human-level control through deep reinforcement learning,” nature, vol. 518, no. 7540, pp. 529–533, 2015.
- “Logo retrieval with a contrario visual query expansion,” in Proceedings of ACM International Conference on Multimedia, 2009, pp. 581–584.
- “Weblogo-2m: Scalable logo detection by deep learning from the web,” in Workshops in Conjunction with IEEE International Conference on Computer Vision, 2017, pp. 270–279.