A Unified, Hardware-Fitted, Cross-GPU Performance Model (1604.04997v1)
Abstract: We present a mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (kernels') expressed in the Loopy programming system, and apply these counts in a simple, linear model of kernel run time. We use a series ofperformance-instructive' kernels to fit the parameters of a unified model to the performance characteristics of GPU hardware from multiple hardware generations and vendors. We evaluate the predictive power of the model on a broad array of computational kernels relevant to scientific computing. In terms of the geometric mean, our simple, vendor- and GPU-type-independent model achieves relative accuracy comparable to that of previously published work using hardware specific models.
Sponsored by Paperpile, the PDF & BibTeX manager trusted by top AI labs.
Get 30 days freePaper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.