Papers
Topics
Authors
Recent
2000 character limit reached

Batched QR and SVD Algorithms on GPUs with Applications in Hierarchical Matrix Compression (1707.05141v1)

Published 13 Jul 2017 in cs.MS, cs.DS, and cs.NA

Abstract: We present high performance implementations of the QR and the singular value decomposition of a batch of small matrices hosted on the GPU with applications in the compression of hierarchical matrices. The one-sided Jacobi algorithm is used for its simplicity and inherent parallelism as a building block for the SVD of low rank blocks using randomized methods. We implement multiple kernels based on the level of the GPU memory hierarchy in which the matrices can reside and show substantial speedups against streamed cuSOLVER SVDs. The resulting batched routine is a key component of hierarchical matrix compression, opening up opportunities to perform H-matrix arithmetic efficiently on GPUs.

Citations (46)

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.