A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends (2402.15490v2)
Abstract: In today's digital age, Convolutional Neural Networks (CNNs), a subset of Deep Learning (DL), are widely used for various computer vision tasks such as image classification, object detection, and image segmentation. There are numerous types of CNNs designed to meet specific needs and requirements, including 1D, 2D, and 3D CNNs, as well as dilated, grouped, attention, depthwise convolutions, and NAS, among others. Each type of CNN has its unique structure and characteristics, making it suitable for specific tasks. It's crucial to gain a thorough understanding and perform a comparative analysis of these different CNN types to understand their strengths and weaknesses. Furthermore, studying the performance, limitations, and practical applications of each type of CNN can aid in the development of new and improved architectures in the future. We also dive into the platforms and frameworks that researchers utilize for their research or development from various perspectives. Additionally, we explore the main research fields of CNN like 6D vision, generative models, and meta-learning. This survey paper provides a comprehensive examination and comparison of various CNN architectures, highlighting their architectural differences and emphasizing their respective advantages, disadvantages, applications, challenges, and future trends.
- N. K. Logothetis and D. L. Sheinberg, ”Visual Object Recognition,” Mar. 1996.
- S. Yang and J. Anjie, ”Recognition of Oil and Gas Reservoir Space Based on Deep Learning,” Jan. 2021.
- TensorFlow Lite. https://www.tensorflow.org/lite/
- R. Kavuluru, ”An end-to-end deep learning architecture for extracting protein–protein interactions affected by genetic mutations,” Jan. 2018.
- S. Klos and J. Patalas-Maliszewska, ”A Model for the Intelligent Supervision of Production for Industry 4.0,” May. 2022.
- A. Maniatopoulos and N. Mitianoudis, ”Learnable Leaky ReLU (LeLeLU): An Alternative Accuracy-Optimized Activation Function,” Dec. 2021.
- K. Shaheen, M. A. Hanif, O. Hasan, and M. Shafique, “Continual Learning for Real-World Autonomous Systems: Algorithms, Challenges and Frameworks,” Journal of Intelligent & Robotic Systems, vol. 105, no. 1, Apr. 2022, doi: 1007/s10846-022-01603-6.
- M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for CNNs,” ICML 2019.
- R. Yu and J. Sun, ”Learning Polynomial-Based Separable Convolution for 3D Point Cloud Analysis,” Jun. 2021.
- M. Alam, M. D. Samad, L. Vidyaratne, A. Glandon, and K. M. Iftekharuddin, “Survey on Deep Neural Networks in Speech and Vision Systems,” Neurocomputing, vol. 417, pp. 302–321, Dec. 2020.
- Q. Zhang, X. Wang, Y. Wu, H. Zhou, and S.-C. Zhu, “Interpretable CNNs for Object Classification,” vol. 43, no. 10, pp. 3416–3431, Oct. 2021.
- M. Kwabena Patrick, A. Felix Adekoya, A. Abra Mighty, and B. Y. Edward, “Capsule Networks – A survey,” Journal of King Saud University - Computer and Information Sciences, Sep. 2019, doi: 10.1016/j.jksuci.2019.09.014.
- X. Yuan, Z. Feng, M. Norton, and X. Li, “Generalized Batch Normalization: Towards Accelerating Deep Neural Networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 1682–1689, Jul. 2019, doi: 10.1609/aaai.v33i01.33011682.
- Y. Wang, G. Wang, C. Chen, and Z. Pan, “Multi-scale dilated convolution of convolutional neural network for image denoising,” Multimedia Tools and Applications, Feb. 2019, doi: 10.1007/s11042-019-7377-y.
- H. Zhu, H. Zhang, and Y. Jin, “From federated learning to federated neural architecture search: a survey,” Complex & Intelligent Systems, Jan. 2021, doi: 10.1007/s40747-020-00247-z.
- O. N. Oyelade and A. E. Ezugwu, “A bioinspired neural architecture search based convolutional neural network for breast cancer detection using histopathology images,” Scientific Reports, vol. 11, no. 1, Oct. 2021, doi: 10.1038/s41598-021-98978-7.
- F. Zhan, H. Zhu, and S. Lu, “Spatial Fusion GAN for Image Synthesis,” Jun. 2019, doi: 10.1109/cvpr.2019.00377.
- Tal Ridnik, E. Ben-Baruch, A. Noy, and Lihi Zelnik-Manor, “ImageNet-21K Pretraining for the Masses,” arXiv (Cornell University), Apr. 2021.
- M. Elhoseny, “Multi-object Detection and Tracking (MODT) Machine Learning Model for Real-Time Video Surveillance Systems,” Circuits, Systems, and Signal Processing, Aug. 2019, doi: 10.1007/s00034-019-01234-7.
- S. Thakur and A. Kumar, “X-ray and CT-scan-based automated detection and classification of covid-19 using convolutional neural networks (CNN),” Biomedical Signal Processing and Control, vol. 69, p. 102920, Aug. 2021, doi: 10.1016/j.bspc.2021.102920.
- H. Tan and M. Bansal, “LXMERT: Learning Cross-Modality Encoder Representations from Transformers,” 2019.
- G. O. Young, “Synthetic structure of industrial plastics,” in Plastics, vol. 3, Polymers of Hexadromicon, J. Peters, Ed., 2nd ed. New York, NY, USA: McGraw-Hill, 1964, pp. 15-64. [Online]. Available: http://www.bookref.com.
- Y. Zhou and Oncel Tuzel, “VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection,” Nov. 2017, doi: 10.48550/arxiv.1711.06396.
- G. Othman and D. Q. Zeebaree, “The Applications of Discrete Wavelet Transform in Image Processing: A Review”, jscdm, vol. 1, no. 2, pp. 31–43, Dec. 2020.
- A. Saxena, A. Khanna, and D. Gupta, “Emotion Recognition and Detection Methods: A Comprehensive Survey,” Journal of Artificial Intelligence and Systems, vol. 2, no. 1, pp. 53–79, 2020, doi: 10.33969/ais.2020.21005.
- S. Fujieda, K. Takayama, and T. Hachisuka, “Wavelet Convolutional Neural Networks,” arXiv.org, May 20, 2018. https://arxiv.org/abs/1805.08620 (accessed Nov. 08, 2023).
- Z. Xie, Z. Wen, J. Liu, Z. Liu, X. Wu, and M. Tan, “Deep Transferring Quantization,” Lecture Notes in Computer Science, pp. 625–642, Jan. 2020, doi: https://doi.org/10.1007/978-3-030-58598-3_37.
- Moez Krichen, “Convolutional Neural Networks: A Survey,” Computers, vol. 12, no. 8, pp. 151–151, Jul. 2023, doi: 10.3390/computers12080151
- A. Khan, A. Sohail, U. Zahoora, and A. S. Qureshi, “A survey of the recent architectures of deep convolutional neural networks,” Artificial Intelligence Review, vol. 53, Apr. 2020, doi: 10.1007/s10462-020-09825-6
- “Caffe — Deep Learning Framework,” Berkeleyvision.org, 2012. https://caffe.berkeleyvision.org/
- PyTorch, “PyTorch,” Pytorch.org, 2023. https://pytorch.org/
- TensorFlow, “TensorFlow,” TensorFlow, 2019. https://www.tensorflow.org/
- Keras, “Home - Keras Documentation,” Keras.io, 2019. https://keras.io/
- OpenCV, “OpenCV library,” Opencv.org, 2019. https://opencv.org/
- “apache/mxnet,” GitHub, Jan. 09, 2024. https://github.com/apache/mxnet (accessed Jan. 09, 2024).
- “Chainer: A flexible framework for neural networks,” Chainer. https://chainer.org/
- “Eclipse DeepLearning4J,” deeplearning4j.konduit.ai. https://deeplearning4j.konduit.ai/
- Gang Lv, Y. Sun, Fudong Nian, M. Zhu, W. Tang, and Z. Hu, “COME: Clip-OCR and Master ObjEct for text image captioning,” Image and Vision Computing, vol. 136, pp. 104751–104751, Aug. 2023, doi: 10.1016/j.imavis.2023.104751.
- M. R. Gupta, N. P. Jacobson, and E. K. Garcia, “OCR binarization and image pre-processing for searching historical documents,” Pattern Recognition, vol. 40, no. 2, pp. 389–397, Feb. 2007, doi: 10.1016/j.patcog.2006.04.043.
- Agung Yuwono Sugiyono, Kendricko Adrio, K. Tanuwijaya, and Kristien Margi Suryaningrum, “Extracting Information from Vehicle Registration Plate using OCR Tesseract,” Procedia Computer Science, vol. 227, pp. 932–938, Jan. 2023, doi: 10.1016/j.procs.2023.10.600.
- J. Xu, W. Zhou, Z. Fu, H. Zhang, and L. Li, “A Survey on Green Deep Learning,” arXiv (Cornell University), Nov. 2021, doi: 10.48550/arxiv.2111.05193.
- S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks,” arXiv (Cornell University), Jun. 2015, doi: 10.48550/arxiv.1506.01497.
- R. Joseph, Divvala Santosh, G. Ross, and F. Ali, “You Only Look Once: Unified, Real-Time Object Detection,” arXiv (Cornell University), Jan. 2016, doi: 10.48550/arxiv.1506.02640.
- J. Terven, D.-M. Córdova-Esparza, and J.-A. Romero-González, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS,” Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680–1716, Dec. 2023, doi: 10.3390/make5040083.
- H. Zhao, J. Shi, X. Qi, X. Wang, and J. Jia, “Pyramid Scene Parsing Network,” arXiv.org, Apr. 27, 2017.
- H. Bao, L. Dong, and F. Wei, “BEiT: BERT Pre-Training of Image Transformers,” arXiv (Cornell University), Jun. 2021, doi: 10.48550/arxiv.2106.08254.
- K. Han, A. Xiao, E. Wu, J. Guo, C. Xu, and Y. Wang, “Transformer in Transformer,” arXiv (Cornell University), Feb. 2021, doi: 10.48550/arxiv.2103.00112.
- S. Mehta and M. Rastegari, “MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer,” arXiv (Cornell University), Oct. 2021, doi: 10.48550/arxiv.2110.02178.
- Y. Song, T. Wang, Subrota Kumar Mondal, and Jyoti Prakash Sahoo, “A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities,” arXiv (Cornell University), May 2022, doi: 10.48550/arxiv.2205.06743.
- Y. Wang, Q. Yao, J. Kwok, and L. M. Ni, “Generalizing from a Few Examples: A Survey on Few-Shot Learning,” arXiv.org, Mar. 29, 2020.
- W. Wang, V. W. Zheng, H. Yu, and C. Miao, “A Survey of Zero-Shot Learning,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–37, Jan. 2019, doi: 10.1145/3293318.
- X. Zhou, A. Sun, Y. Liu, J. Zhang, and C. Miao, “SelfCF: A Simple Framework for Self-supervised Collaborative Filtering,” ACM Transactions on Recommender Systems, Apr. 2023, doi: 10.1145/3591469.
- L. Wang, X. Zhang, H. Su, and J. Zhu, “A Comprehensive Survey of Continual Learning: Theory, Method and Application,” arXiv.org, Jun. 10, 2023. https://arxiv.org/abs/2302.00487 (accessed Sep. 04, 2023).
- B. Sistaninejhad, H. Rasi, and P. Nayeri, “A Review Paper about Deep Learning for Medical Image Analysis,” Computational and Mathematical Methods in Medicine, vol. 2023, p. e7091301, May 2023, doi: 10.1155/2023/7091301.
- Giorgos Papanastasiou, NiκoλαoςΔiκαioςNi𝜅o𝜆𝛼o𝜍Δi𝜅𝛼io𝜍\text{Ni}\kappa\text{o}\lambda\alpha\text{o}\varsigma\ \Delta\text{i}\kappa% \alpha\text{i}\text{o}\varsigmaNi italic_κ o italic_λ italic_α o italic_ς roman_Δ i italic_κ italic_α roman_i roman_o italic_ς, J. Huang, C. Wang, and G. Yang, “Is attention all you need in medical image analysis? A review,” arXiv (Cornell University), Jul. 2023, doi: 10.48550/arxiv.2307.12775.
- S. M. Anwar, M. Majid, A. Qayyum, M. Awais, M. Alnowami, and M. K. Khan, “Medical Image Analysis using Convolutional Neural Networks: A Review,” Journal of Medical Systems, vol. 42, no. 11, Oct. 2018, doi: 10.1007/s10916-018-1088-1.
- D. Shen, G. Wu, and H.-I. Suk, “Deep Learning in Medical Image Analysis,” Annual Review of Biomedical Engineering, vol. 19, no. 1, pp. 221–248, Jun. 2017, doi: 10.1146/annurev-bioeng-071516-044442.
- J. Jiang, P. Trundle, and J. Ren, “Medical image analysis with artificial neural networks,” Computerized Medical Imaging and Graphics, vol. 34, no. 8, pp. 617–631, Dec. 2010, doi: 10.1016/j.compmedimag.2010.07.003.
- C. Sahin, G. Garcia-Hernando, J. Sock, and T.-K. Kim, “A review on object pose recovery: From 3D bounding box detectors to full 6D pose estimators,” Image and Vision Computing, vol. 96, p. 103898, Apr. 2020, doi: https://doi.org/10.1016/j.imavis.2020.103898.
- Z. He, W. Feng, X. Zhao, and Y. Lv, “6D Pose Estimation of Objects: Recent Technologies and Challenges,” Applied Sciences, vol. 11, no. 1, p. 228, Jan. 2021, doi: 10.3390/app11010228.
- Y. Li, G. Wang, X. Ji, Y. Xiang, and D. Fox, “DeepIM: Deep Iterative Matching for 6D Pose Estimation,” International Journal of Computer Vision, vol. 128, no. 3, pp. 657–678, Nov. 2019, doi: 10.1007/s11263-019-01250-9.
- T. Elsken, Jan Hendrik Metzen, and F. Hutter, “Neural Architecture Search: A Survey,” arXiv (Cornell University), vol. 20, no. 55, pp. 1–21, Jan. 2019.
- J. Mellor, J. Turner, A. Storkey, and E. J. Crowley, “Neural Architecture Search without Training,” arXiv (Cornell University), Jun. 2020.
- L. Sekanina, “Neural Architecture Search and Hardware Accelerator Co-Search: A Survey,” IEEE Access, vol. 9, pp. 151337–151362, 2021, doi: 10.1109/access.2021.3126685.
- H. Cao, C. Tan, Z. Gao, G. Chen, P. Heng, and S. Z. Li, “A Survey on Generative Diffusion Model,” arXiv (Cornell University), Sep. 2022, doi: 10.48550/arxiv.2209.02646.
- K. Zhou and Xin Eric Wang, “FedVLN: Privacy-preserving Federated Vision-and-Language Navigation,” arXiv (Cornell University), Mar. 2022, doi: 10.48550/arxiv.2203.14936.
- P. K. Mandal, Carter De Leo, and C. Hurley, “Horizontal Federated Computer Vision,” arXiv (Cornell University), Dec. 2023, doi: 10.48550/arxiv.2401.00390.