SSAP: A Shape-Sensitive Adversarial Patch for Comprehensive Disruption of Monocular Depth Estimation in Autonomous Navigation Applications (2403.11515v2)
Abstract: Monocular depth estimation (MDE) has advanced significantly, primarily through the integration of convolutional neural networks (CNNs) and more recently, Transformers. However, concerns about their susceptibility to adversarial attacks have emerged, especially in safety-critical domains like autonomous driving and robotic navigation. Existing approaches for assessing CNN-based depth prediction methods have fallen short in inducing comprehensive disruptions to the vision system, often limited to specific local areas. In this paper, we introduce SSAP (Shape-Sensitive Adversarial Patch), a novel approach designed to comprehensively disrupt monocular depth estimation (MDE) in autonomous navigation applications. Our patch is crafted to selectively undermine MDE in two distinct ways: by distorting estimated distances or by creating the illusion of an object disappearing from the system's perspective. Notably, our patch is shape-sensitive, meaning it considers the specific shape and scale of the target object, thereby extending its influence beyond immediate proximity. Furthermore, our patch is trained to effectively address different scales and distances from the camera. Experimental results demonstrate that our approach induces a mean depth estimation error surpassing 0.5, impacting up to 99% of the targeted region for CNN-based MDE models. Additionally, we investigate the vulnerability of Transformer-based MDE models to patch-based attacks, revealing that SSAP yields a significant error of 0.59 and exerts substantial influence over 99% of the target region on these models.
- K. Yamanaka, R. Matsumoto, K. Takahashi, and T. Fujii, “Adversarial patch attacks on monocular depth estimation networks,” IEEE Access, vol. 8, pp. 179 094–179 104, 2020.
- Z. Cheng, J. Liang, H. Choi, G. Tao, Z. Cao, D. Liu, and X. Zhang, “Physical attack on monocular depth estimation with optimal adversarial patches,” 2022. [Online]. Available: https://arxiv.org/abs/2207.04718
- X. Yang, J. Chen, Y. Dang, H. Luo, Y. Tang, C. Liao, P. Chen, and K.-T. Cheng, “Fast depth prediction and obstacle avoidance on a monocular drone using probabilistic convolutional neural network,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 1, pp. 156–167, 2021.
- Y. Wang, W.-L. Chao, D. Garg, B. Hariharan, M. Campbell, and K. Q. Weinberger, “Pseudo-lidar from visual depth estimation: Bridging the gap in 3d object detection for autonomous driving,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 8437–8445.
- K. Tateno, F. Tombari, I. Laina, and N. Navab, “Cnn-slam: Real-time dense monocular slam with learned depth prediction,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6565–6574, 2017.
- F. Wimbauer, N. Yang, L. von Stumberg, N. Zeller, and D. Cremers, “Monorec: Semi-supervised dense reconstruction in dynamic environments from a single moving camera,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 6108–6118.
- L. von Stumberg, P. Wenzel, N. Yang, and D. Cremers, “Lm-reloc: Levenberg-marquardt based direct visual relocalization,” CoRR, vol. abs/2010.06323, 2020. [Online]. Available: https://arxiv.org/abs/2010.06323
- “Andrej karpathy - ai for full-self driving at tesla,” https://youtu.be/hx7BXih7zx8, accessed: March 1, 2023.
- “Tesla ai day 2021,” https://www.youtube.com/live/j0z4FweCy4M?feature=share, accessed: March 1, 2023.
- V. Guizilini, R. Ambrus, S. Pillai, and A. Gaidon, “Packnet-sfm: 3d packing for self-supervised monocular depth estimation,” CoRR, vol. abs/1905.02693, 2019. [Online]. Available: http://arxiv.org/abs/1905.02693
- S. Aich, J. M. U. Vianney, M. A. Islam, M. Kaur, and B. Liu, “Bidirectional attention network for monocular depth estimation,” CoRR, vol. abs/2009.00743, 2020. [Online]. Available: https://arxiv.org/abs/2009.00743
- H. Fu, M. Gong, C. Wang, K. Batmanghelich, and D. Tao, “Deep ordinal regression network for monocular depth estimation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 2002–2011.
- M. Song, S. Lim, and W. Kim, “Monocular depth estimation using laplacian pyramid-based depth residuals,” IEEE transactions on circuits and systems for video technology, vol. 31, no. 11, pp. 4381–4393, 2021.
- L. Huynh, P. Nguyen-Ha, J. Matas, E. Rahtu, and J. Heikkilä, “Guiding monocular depth estimation using depth-attention volume,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVI 16. Springer, 2020, pp. 581–597.
- W. Chang, Y. Zhang, and Z. Xiong, “Transformer-based monocular depth estimation with attention supervision,” in 32nd British Machine Vision Conference (BMVC 2021), 2021.
- A. Varma, H. Chawla, B. Zonooz, and E. Arani, “Transformers in self-supervised monocular depth estimation with unknown camera intrinsics,” arXiv preprint arXiv:2202.03131, 2022.
- H. Chawla, K. Jeeveswaran, E. Arani, and B. Zonooz, “Image masking for robust self-supervised monocular depth estimation,” 2023.
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2014. [Online]. Available: https://arxiv.org/abs/1412.6980
- T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in ECCV, 2014.
- Y.-C.-T. Hu, J.-C. Chen, B.-H. Kung, K.-L. Hua, and D. S. Tan, “Naturalistic physical adversarial patch for object detectors,” in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 7828–7837.
- A. Mahendran and A. Vedaldi, “Understanding deep image representations by inverting them,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015, pp. 5188–5196.
- C. Godard, O. M. Aodha, and G. J. Brostow, “Digging into self-supervised monocular depth estimation,” CoRR, vol. abs/1806.01260, 2018. [Online]. Available: http://arxiv.org/abs/1806.01260
- J. Watson, M. Firman, G. Brostow, and D. Turmukhambetov, “Self-supervised monocular depth hints,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 2162–2171.
- J. Watson, O. M. Aodha, V. Prisacariu, G. Brostow, and M. Firman, “The temporal opportunist: Self-supervised multi-frame monocular depth,” in 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Los Alamitos, CA, USA: IEEE Computer Society, jun 2021, pp. 1164–1174. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/CVPR46437.2021.00122
- A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? the kitti vision benchmark suite,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012, pp. 3354–3361.
- S. Yu, D. Tan, and T. Tan, “A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition,” in 18th international conference on pattern recognition (ICPR’06), vol. 4. IEEE, 2006, pp. 441–444.
- A. Guesmi, R. Ding, M. A. Hanif, I. Alouani, and M. Shafique, “Dap: A dynamic adversarial patch for evading person detectors,” 2023.
- A. Athalye, L. Engstrom, A. Ilyas, and K. Kwok, “Synthesizing robust adversarial examples,” in ICML, 2018.
- K. Eykholt, I. Evtimov, E. Fernandes, B. Li, A. Rahmati, F. Tramèr, A. Prakash, T. Kohno, and D. Song, “Physical adversarial examples for object detectors,” in Proceedings of the 12th USENIX Conference on Offensive Technologies, ser. WOOT’18. USA: USENIX Association, 2018, p. 1.
- M. Sharif, S. Bhagavatula, L. Bauer, and M. K. Reiter, “Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition,” in Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 1528–1540. [Online]. Available: https://doi.org/10.1145/2976749.2978392
- S. Komkov and A. Petiushko, “Advhat: Real-world adversarial attack on arcface face id system,” in 2020 25th International Conference on Pattern Recognition (ICPR). Los Alamitos, CA, USA: IEEE Computer Society, jan 2021, pp. 819–826. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/ICPR48806.2021.9412236
- Z. Zhang, X. Zhu, Y. Li, X. Chen, and Y. Guo, “Adversarial attacks on monocular depth estimation,” CoRR, vol. abs/2003.10315, 2020. [Online]. Available: https://arxiv.org/abs/2003.10315
- A. Wong, S. Cicek, and S. Soatto, “Targeted adversarial perturbations for monocular depth prediction,” in Proceedings of the 34th International Conference on Neural Information Processing Systems, ser. NIPS’20. Red Hook, NY, USA: Curran Associates Inc., 2020.
- Amira Guesmi (22 papers)
- Muhammad Abdullah Hanif (60 papers)
- Ihsen Alouani (29 papers)
- Bassem Ouni (13 papers)
- Muhammad Shafique (204 papers)