Spatial-Assistant Encoder-Decoder Network for Real Time Semantic Segmentation (2309.10519v1)
Abstract: Semantic segmentation is an essential technology for self-driving cars to comprehend their surroundings. Currently, real-time semantic segmentation networks commonly employ either encoder-decoder architecture or two-pathway architecture. Generally speaking, encoder-decoder models tend to be quicker,whereas two-pathway models exhibit higher accuracy. To leverage both strengths, we present the Spatial-Assistant Encoder-Decoder Network (SANet) to fuse the two architectures. In the overall architecture, we uphold the encoder-decoder design while maintaining the feature maps in the middle section of the encoder and utilizing atrous convolution branches for same-resolution feature extraction. Toward the end of the encoder, we integrate the asymmetric pooling pyramid pooling module (APPPM) to optimize the semantic extraction of the feature maps. This module incorporates asymmetric pooling layers that extract features at multiple resolutions. In the decoder, we present a hybrid attention module, SAD, that integrates horizontal and vertical attention to facilitate the combination of various branches. To ascertain the effectiveness of our approach, our SANet model achieved competitive results on the real-time CamVid and cityscape datasets. By employing a single 2080Ti GPU, SANet achieved a 78.4 % mIOU at 65.1 FPS on the Cityscape test dataset and 78.8 % mIOU at 147 FPS on the CamVid test dataset. The training code and model for SANet are available at https://github.com/CuZaoo/SANet-main
- M. Yang, “Research on vehicle automatic driving target perception technology based on improved msrpn algorithm,” Journal of Computational and Cognitive Engineering, vol. 1, no. 3, pp. 147–151, 2022.
- J. Shahmoradi, E. Talebi, P. Roghanchi, and M. Hassanalian, “A comprehensive review of applications of drone technology in the mining industry,” Drones, vol. 4, no. 3, p. 34, 2020.
- Y. Shen, H. Zhang, Y. Fan, A. P. Lee, and L. Xu, “Smart health of ultrasound telemedicine based on deeply represented semantic segmentation,” IEEE Internet of Things Journal, vol. 8, no. 23, pp. 16770–16778, 2020.
- F. I. Diakogiannis, F. Waldner, P. Caccetta, and C. Wu, “Resunet-a: A deep learning framework for semantic segmentation of remotely sensed data,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 162, pp. 94–114, 2020.
- J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
- O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pp. 234–241, Springer, 2015.
- V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 39, no. 12, pp. 2481–2495, 2017.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid scene parsing network,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2881–2890, 2017.
- A. Paszke, A. Chaurasia, S. Kim, and E. Culurciello, “Enet: A deep neural network architecture for real-time semantic segmentation,” arXiv preprint arXiv:1606.02147, 2016.
- M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset for semantic urban scene understanding,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3213–3223, 2016.
- G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in video: A high-definition ground truth database,” Pattern Recognition Letters, vol. 30, no. 2, pp. 88–97, 2009.
- L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
- L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
- K. Sun, Y. Zhao, B. Jiang, T. Cheng, B. Xiao, D. Liu, Y. Mu, X. Wang, W. Liu, and J. Wang, “High-resolution representations for labeling pixels and regions,” arXiv preprint arXiv:1904.04514, 2019.
- Z. Chen, Y. Duan, W. Wang, J. He, T. Lu, J. Dai, and Y. Qiao, “Vision transformer adapter for dense predictions,” arXiv preprint arXiv:2205.08534, 2022.
- S.-Y. Lo, H.-M. Hang, S.-W. Chan, and J.-J. Lin, “Efficient dense modules of asymmetric convolution for real-time semantic segmentation,” in Proceedings of the ACM multimedia Asia, pp. 1–6, 2019.
- Y. Wang, Q. Zhou, J. Xiong, X. Wu, and X. Jin, “Esnet: An efficient symmetric network for real-time semantic segmentation,” in Pattern Recognition and Computer Vision: Second Chinese Conference, PRCV 2019, Xi’an, China, November 8–11, 2019, Proceedings, Part II 2, pp. 41–52, Springer, 2019.
- X. Li, A. You, Z. Zhu, H. Zhao, M. Yang, K. Yang, S. Tan, and Y. Tong, “Semantic flow for fast and accurate scene parsing,” in Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 775–793, Springer, 2020.
- N. K. Tomar, D. Jha, M. A. Riegler, H. D. Johansen, D. Johansen, J. Rittscher, P. Halvorsen, and S. Ali, “Fanet: A feedback attention network for improved biomedical image segmentation,” IEEE Transactions on Neural Networks and Learning Systems, 2022.
- M. Fan, S. Lai, J. Huang, X. Wei, Z. Chai, J. Luo, and X. Wei, “Rethinking bisenet for real-time semantic segmentation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 9716–9725, 2021.
- X. Weng, Y. Yan, G. Dong, C. Shu, B. Wang, H. Wang, and J. Zhang, “Deep multi-branch aggregation network for real-time semantic segmentation in street scenes,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 10, pp. 17224–17240, 2022.
- R. Gao, “Rethink dilated convolution for real-time semantic segmentation,” arXiv preprint arXiv:2111.09957, 2021.
- L. Rosas-Arias, G. Benitez-Garcia, J. Portillo-Portillo, J. Olivares-Mercado, G. Sanchez-Perez, and K. Yanai, “Fassd-net: Fast and accurate real-time semantic segmentation for embedded systems,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 9, pp. 14349–14360, 2021.
- H. Zhao, X. Qi, X. Shen, J. Shi, and J. Jia, “Icnet for real-time semantic segmentation on high-resolution images,” in Proceedings of the European conference on computer vision (ECCV), pp. 405–420, 2018.
- C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, and N. Sang, “Bisenet: Bilateral segmentation network for real-time semantic segmentation,” in Proceedings of the European conference on computer vision (ECCV), pp. 325–341, 2018.
- H. Pan, Y. Hong, W. Sun, and Y. Jia, “Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes,” IEEE Transactions on Intelligent Transportation Systems, vol. 24, no. 3, pp. 3448–3460, 2022.
- Y. Wang, S. Chen, H. Bian, W. Li, and Q. Lu, “Deep multi-resolution network for real-time semantic segmentation in street scenes,” in 2023 International Joint Conference on Neural Networks (IJCNN), pp. 01–08, IEEE, 2023.
- J. Xu, Z. Xiong, and S. P. Bhattacharyya, “Pidnet: A real-time semantic segmentation network inspired by pid controllers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19529–19539, 2023.
- J. Peng, Y. Liu, S. Tang, Y. Hao, L. Chu, G. Chen, Z. Wu, Z. Chen, Z. Yu, Y. Du, et al., “Pp-liteseg: A superior real-time semantic segmentation model,” arXiv preprint arXiv:2204.02681, 2022.
- K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
- A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” Advances in neural information processing systems, vol. 25, 2012.
- H. Si, Z. Zhang, F. Lv, G. Yu, and F. Lu, “Real-time semantic segmentation via multiply spatial fusion network,” arXiv preprint arXiv:1911.07217, 2019.
- M. Orsic, I. Kreso, P. Bevandic, and S. Segvic, “In defense of pre-trained imagenet architectures for real-time semantic segmentation of road-driving images,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12607–12616, 2019.
- C. Yu, C. Gao, J. Wang, G. Yu, C. Shen, and N. Sang, “Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation,” International Journal of Computer Vision, vol. 129, pp. 3051–3068, 2021.
- M. Y. Yang, S. Kumaar, Y. Lyu, and F. Nex, “Real-time semantic segmentation with context aggregation network,” ISPRS journal of photogrammetry and remote sensing, vol. 178, pp. 124–134, 2021.
- G. Dong, Y. Yan, C. Shen, and H. Wang, “Real-time high-performance semantic image segmentation of urban street scenes,” IEEE Transactions on Intelligent Transportation Systems, vol. 22, no. 6, pp. 3258–3274, 2020.
- M. Lu, Z. Chen, C. Liu, S. Ma, L. Cai, and H. Qin, “Mfnet: Multi-feature fusion network for real-time semantic segmentation in road scenes,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 20991–21003, 2022.
- S. Li, Q. Yan, W. Shi, L. Wang, C. Liu, and Q. Chen, “Holoparser: Holistic visual parsing for real-time semantic segmentation in autonomous driving,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–15, 2022.
- Y. Nirkin, L. Wolf, and T. Hassner, “Hyperseg: Patch-wise hypernetwork for real-time semantic segmentation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4061–4070, 2021.