Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 22 tok/s Pro
GPT-4o 79 tok/s Pro
Kimi K2 178 tok/s Pro
GPT OSS 120B 433 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Hierarchical Jacobi Iteration for Structured Matrices on GPUs using Shared Memory (2006.16465v1)

Published 30 Jun 2020 in cs.MS, cs.NA, and math.NA

Abstract: High fidelity scientific simulations modeling physical phenomena typically require solving large linear systems of equations which result from discretization of a partial differential equation (PDE) by some numerical method. This step often takes a vast amount of computational time to complete, and therefore presents a bottleneck in simulation work. Solving these linear systems efficiently requires the use of massively parallel hardware with high computational throughput, as well as the development of algorithms which respect the memory hierarchy of these hardware architectures to achieve high memory bandwidth. In this paper, we present an algorithm to accelerate Jacobi iteration for solving structured problems on graphics processing units (GPUs) using a hierarchical approach in which multiple iterations are performed within on-chip shared memory every cycle. A domain decomposition style procedure is adopted in which the problem domain is partitioned into subdomains whose data is copied to the shared memory of each GPU block. Jacobi iterations are performed internally within each block's shared memory, avoiding the need to perform expensive global memory accesses every step. We test our algorithm on the linear systems arising from discretization of Poisson's equation in 1D and 2D, and observe speedup in convergence using our shared memory approach compared to a traditional Jacobi implementation which only uses global memory on the GPU. We observe a x8 speedup in convergence in the 1D problem and a nearly x6 speedup in the 2D case from the use of shared memory compared to a conventional GPU approach.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.