Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000 (2210.12230v1)

Published 21 Oct 2022 in cs.PL, cs.DC, and cs.PF

Abstract: As the hardware industry moves towards using specialized heterogeneous many-cores to avoid the effects of the power wall, software developers are finding it hard to deal with the complexity of these systems. This article shares our experience when developing a programming model and its supporting compiler and libraries for Matrix-3000, which is designed for next-generation exascale supercomputers but has a complex memory hierarchy and processor organization. To assist its software development, we developed a software stack from scratch that includes a low-level programming interface and a high-level OpenCL compiler. Our low-level programming model offers native programming support for using the bare-metal accelerators of Matrix-3000, while the high-level model allows programmers to use the OpenCL programming standard. We detail our design choices and highlight the lessons learned from developing systems software to enable the programming of bare-metal accelerators. Our programming models have been deployed to the production environment of an exascale prototype system.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Jianbin Fang (12 papers)
  2. Peng Zhang (642 papers)
  3. Chun Huang (26 papers)
  4. Tao Tang (87 papers)
  5. Kai Lu (35 papers)
  6. Ruibo Wang (24 papers)
  7. Zheng Wang (400 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.