DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures (2310.10168v2)
Abstract: The growing volume of data in modern applications has led to significant computational costs in conventional processor-centric systems. Processing-in-memory (PIM) architectures alleviate these costs by moving computation closer to memory, reducing data movement overheads. UPMEM is the first commercially available PIM system, featuring thousands of in-order processors (DPUs) integrated within DRAM modules. However, a programming UPMEM-based system remains challenging due to the need for explicit data management and workload partitioning across DPUs. We introduce DaPPA (data-parallel processing-in-memory architecture), a programming framework that eases the programmability of UPMEM systems by automatically managing data movement, memory allocation, and workload distribution. The key idea behind DaPPA is to leverage a high-level data-parallel pattern-based programming interface to abstract hardware complexities away from the programmer. DaPPA comprises three main components: (i) data-parallel pattern APIs, a collection of five primary data-parallel pattern primitives that allow the programmer to express data transformations within an application; (ii) a dataflow programming interface, which allows the programmer to define how data moves across data-parallel patterns; and (iii) a dynamic template-based compilation, which leverages code skeletons and dynamic code transformations to convert data-parallel patterns implemented via the dataflow programming interface into an optimized UPMEM binary. We evaluate DaPPA using six workloads from the PrIM benchmark suite on a real UPMEM system. Compared to hand-tuned implementations, DaPPA improves end-to-end performance by 2.1x, on average, and reduces programming complexity (measured in lines-of-code) by 94%. Our results demonstrate that DaPPA is an effective programming framework for efficient and user-friendly programming on UPMEM systems.
- S. Ghose et al., “Processing-in-Memory: A Workload-Driven Perspective,” IBM JRD, 2019.
- O. Mutlu et al., “A Modern Primer on Processing in Memory,” Emerging Computing: From Devices to Systems - Looking Beyond Moore and Von Neumann, 2021.
- G. F. Oliveira et al., “DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks,” IEEE Access, 2021.
- S. Ghose et al., “The Processing-in-Memory Paradigm: Mechanisms to Enable Adoption,” in Beyond-CMOS Technologies for Next Generation Computer Design, 2019.
- O. Mutlu et al., “Enabling Practical Processing in and Near Memory for Data-Intensive Computing,” in DAC, 2019.
- O. Mutlu, “Memory Scaling: A Systems Architecture Perspective,” in IMW, 2013.
- W. H. Kautz, “Cellular Logic-in-Memory Arrays,” IEEE TC, 1969.
- H. S. Stone, “A logic-in-memory computer.”
- UPMEM, “UPMEM Website,” https://www.upmem.com, 2023.
- UPMEM, “Introduction to UPMEM PIM. Processing-in-memory (PIM) on DRAM Accelerator (White Paper),” 2018.
- Y.-C. Kwon et al., “25.4 A 20nm 6GB Function-In-Memory DRAM, Based on HBM2 with a 1.2 TFLOPS Programmable Computing Unit Using Bank-Level Parallelism, for Machine Learning Applications,” in ISSCC, 2021.
- S. Lee et al., “Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology: Industrial Product,” in ISCA, 2021.
- L. Ke et al., “Near-Memory Processing in Action: Accelerating Personalized Recommendation with AxDIMM,” IEEE Micro, 2021.
- J. Gómez-Luna et al., “Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture,” arXiv:2105.03814 [cs.AR], 2021.
- M. Cole, “Bringing Skeletons Out of the Closet: A Pragmatic Manifesto for Skeletal Parallel Programming,” Parallel Computing, 2004.
- SAFARI Research Group, “PrIM Benchmark Suite,” https://github.com/CMU-SAFARI/prim-benchmarks.