Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 64 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 35 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 457 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

Comparative Analysis of CPU and GPU Profiling for Deep Learning Models (2309.02521v3)

Published 5 Sep 2023 in cs.DC and cs.LG

Abstract: Deep Learning(DL) and Machine Learning(ML) applications are rapidly increasing in recent days. Massive amounts of data are being generated over the internet which can derive meaningful results by the use of ML and DL algorithms. Hardware resources and open-source libraries have made it easy to implement these algorithms. Tensorflow and Pytorch are one of the leading frameworks for implementing ML projects. By using those frameworks, we can trace the operations executed on both GPU and CPU to analyze the resource allocations and consumption. This paper presents the time and memory allocation of CPU and GPU while training deep neural networks using Pytorch. This paper analysis shows that GPU has a lower running time as compared to CPU for deep neural networks. For a simpler network, there are not many significant improvements in GPU over the CPU.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (12)
  1. Tensorflow profiler.
  2. I. Alkaabwi. Comparison between cpu and gpu for parallel implementation for a neural network model using tensorflow and a big dataset. In Electronic Theses and Dissertations. 3524., 2021. URL https://digitalcommons.library.umaine.edu/etd/3524.
  3. Profiling general purpose gpu applications. In 2009 21st International Symposium on Computer Architecture and High Performance Computing, pages 11–18, 2009. doi: 10.1109/SBAC-PAD.2009.26.
  4. A. P. et.al. Pytorch: An imperative style, high-performance deep learning library., 2019.
  5. Comparative analysis of multiple deep cnn models for waste classification. arXiv preprint arXiv:2004.02168, 2020.
  6. Densely connected convolutional networks, 2016. URL https://arxiv.org/abs/1608.06993.
  7. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998. doi: 10.1109/5.726791.
  8. P. A. Lind, E. A performance comparison between cpu and gpu in tensorflow. 2019.
  9. T. M. Mitchell. Machine learning 1st edition. McGraw-Hill Education, 1997.
  10. R. Salgado. Profiling kernels behavior to improve cpu / gpu interactions. In 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering, volume 2, pages 754–756, 2015. doi: 10.1109/ICSE.2015.239.
  11. Optimizing the use of gpu memory in applications with large data sets. In 2009 International Conference on High Performance Computing (HiPC), pages 408–418, 2009. doi: 10.1109/HIPC.2009.5433185.
  12. Flexible software profiling of gpu architectures. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), pages 185–197, 2015. doi: 10.1145/2749469.2750375.
Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Authors (1)

X Twitter Logo Streamline Icon: https://streamlinehq.com