Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 188 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 39 tok/s Pro
GPT-5 High 39 tok/s Pro
GPT-4o 78 tok/s Pro
Kimi K2 207 tok/s Pro
GPT OSS 120B 446 tok/s Pro
Claude Sonnet 4.5 35 tok/s Pro
2000 character limit reached

A Non-linear GPU Thread Map for Triangular Domains (1609.01490v1)

Published 6 Sep 2016 in cs.DC

Abstract: There is a stage in the GPU computing pipeline where a grid of thread-blocks, in \textit{parallel space}, is mapped onto the problem domain, in \textit{data space}. Since the parallel space is restricted to a box type geometry, the mapping approach is typically a $k$-dimensional bounding box (BB) that covers a $p$-dimensional data space. Threads that fall inside the domain perform computations while threads that fall outside are discarded at runtime. In this work we study the case of mapping threads efficiently onto triangular domain problems and propose a block-space linear map $\lambda(\omega)$, based on the properties of the lower triangular matrix, that reduces the number of unnnecessary threads from $\mathcal{O}(n2)$ to $\mathcal{O}(n)$. Performance results for global memory accesses show an improvement of up to $18\%$ with respect to the \textit{bounding-box} approach, placing $\lambda(\omega)$ on second place below the \textit{rectangular-box} approach and above the \textit{recursive-partition} and \textit{upper-triangular} approaches. For shared memory scenarios $\lambda(\omega)$ was the fastest approach achieving $7\%$ of performance improvement while preserving thread locality. The results obtained in this work make $\lambda(\omega)$ an interesting map for efficient GPU computing on parallel problems that define a triangular domain with or without neighborhood interactions. The extension to tetrahedral domains is analyzed, with applications to triplet-interaction n-body applications.

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.