Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 75 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 20 tok/s Pro
GPT-5 High 18 tok/s Pro
GPT-4o 95 tok/s Pro
Kimi K2 193 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures (2310.10168v2)

Published 16 Oct 2023 in cs.AR

Abstract: The growing volume of data in modern applications has led to significant computational costs in conventional processor-centric systems. Processing-in-memory (PIM) architectures alleviate these costs by moving computation closer to memory, reducing data movement overheads. UPMEM is the first commercially available PIM system, featuring thousands of in-order processors (DPUs) integrated within DRAM modules. However, a programming UPMEM-based system remains challenging due to the need for explicit data management and workload partitioning across DPUs. We introduce DaPPA (data-parallel processing-in-memory architecture), a programming framework that eases the programmability of UPMEM systems by automatically managing data movement, memory allocation, and workload distribution. The key idea behind DaPPA is to leverage a high-level data-parallel pattern-based programming interface to abstract hardware complexities away from the programmer. DaPPA comprises three main components: (i) data-parallel pattern APIs, a collection of five primary data-parallel pattern primitives that allow the programmer to express data transformations within an application; (ii) a dataflow programming interface, which allows the programmer to define how data moves across data-parallel patterns; and (iii) a dynamic template-based compilation, which leverages code skeletons and dynamic code transformations to convert data-parallel patterns implemented via the dataflow programming interface into an optimized UPMEM binary. We evaluate DaPPA using six workloads from the PrIM benchmark suite on a real UPMEM system. Compared to hand-tuned implementations, DaPPA improves end-to-end performance by 2.1x, on average, and reduces programming complexity (measured in lines-of-code) by 94%. Our results demonstrate that DaPPA is an effective programming framework for efficient and user-friendly programming on UPMEM systems.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. S. Ghose et al., “Processing-in-Memory: A Workload-Driven Perspective,” IBM JRD, 2019.
  2. O. Mutlu et al., “A Modern Primer on Processing in Memory,” Emerging Computing: From Devices to Systems - Looking Beyond Moore and Von Neumann, 2021.
  3. G. F. Oliveira et al., “DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks,” IEEE Access, 2021.
  4. S. Ghose et al., “The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption,” in Beyond-CMOS Technologies for Next Generation Computer Design, 2019.
  5. O. Mutlu et al., “Enabling Practical Processing in and Near Memory for Data-Intensive Computing,” in DAC, 2019.
  6. O. Mutlu, “Memory Scaling: A Systems Architecture Perspective,” in IMW, 2013.
  7. W. H. Kautz, “Cellular Logic-in-Memory Arrays,” IEEE TC, 1969.
  8. H. S. Stone, “A logic-in-memory computer.”
  9. UPMEM, “UPMEM Website,” https://www.upmem.com, 2023.
  10. UPMEM, “Introduction to UPMEM PIM. Processing-in-memory (PIM) on DRAM Accelerator (White Paper),” 2018.
  11. Y.-C. Kwon et al., “25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2 TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications,” in ISSCC, 2021.
  12. S. Lee et al., “Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product,” in ISCA, 2021.
  13. L. Ke et al., “Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM,” IEEE Micro, 2021.
  14. J. Gómez-Luna et al., “Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture,” arXiv:2105.03814 [cs.AR], 2021.
  15. M. Cole, “Bringing Skeletons Out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming,” Parallel Computing, 2004.
  16. SAFARI Research Group, “PrIM Benchmark Suite,” https://github.com/CMU-SAFARI/prim-benchmarks.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.