Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 63 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 14 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 100 tok/s Pro
Kimi K2 174 tok/s Pro
GPT OSS 120B 472 tok/s Pro
Claude Sonnet 4 37 tok/s Pro
2000 character limit reached

An approach to performance portability through generic programming (2311.05038v1)

Published 8 Nov 2023 in cs.SE and cs.DC

Abstract: The expanding hardware diversity in high performance computing adds enormous complexity to scientific software development. Developers who aim to write maintainable software have two options: 1) To use a so-called data locality abstraction that handles portability internally, thereby, performance-productivity becomes a trade off. Such abstractions usually come in the form of libraries, domain-specific languages, and run-time systems. 2) To use generic programming where performance, productivity and portability are subject to software design. In the direction of the second, this work describes a design approach that allows the integration of low-level and verbose programming tools into high-level generic algorithms based on template meta-programming in C++. This enables the development of performance-portable applications targeting host-device computer architectures, such as CPUs and GPUs. With a suitable design in place, the extensibility of generic algorithms to new hardware becomes a well defined procedure that can be developed in isolation from other parts of the code. That allows scientific software to be maintainable and efficient in a period of diversifying hardware in HPC. As proof of concept, a finite-difference modelling algorithm for the acoustic wave equation is developed and benchmarked using roofline model analysis on Intel Xeon Gold 6248 CPU, Nvidia Tesla V100 GPU, and AMD MI100 GPU.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.