Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 65 tok/s
Gemini 2.5 Pro 47 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 32 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 164 tok/s Pro
GPT OSS 120B 466 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

A Fresh Perspective on DNN Accelerators by Performing Holistic Analysis Across Paradigms (2208.05294v1)

Published 10 Aug 2022 in cs.AR

Abstract: Traditional computers with von Neumann architecture are unable to meet the latency and scalability challenges of Deep Neural Network (DNN) workloads. Various DNN accelerators based on Conventional compute Hardware Accelerator (CHA), Near-Data-Processing (NDP) and Processing-in-Memory (PIM) paradigms have been proposed to meet these challenges. Our goal in this work is to perform a rigorous comparison among the state-of-the-art accelerators from DNN accelerator paradigms, we have used unique layers from MobileNet, ResNet, BERT, and DLRM of MLPerf Inference benchmark for our analysis. The detailed models are based on hardware-realized state-of-the art designs. We observe that for memory-intensive Fully Connected Layer (FCL) DNNs, NDP based accelerator is 10.6x faster than the state-of-the-art CHA and 39.9x faster than PIM based accelerator for inferencing. For compute-intensive image classification and object detection DNNs, the state-of-the-art CHA is ~10x faster than NDP and ~2000x faster than the PIM-based accelerator for inferencing. PIM-based accelerators are suitable for DNN applications where energy is a constraint (~2.7x and ~21x lower energy for CNN and FCL applications, respectively, than conventional ASIC systems). Further, we identify architectural changes (such as increasing memory bandwidth, buffer reorganization) that can increase throughput (up to linear increase) and lower energy (up to linear decrease) for ML applications with a detailed sensitivity analysis of relevant components in CHA, NDP and PIM based accelerators.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.