Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Efficient GPU implementation of randomized SVD and its applications (2110.03423v2)

Published 5 Oct 2021 in cs.LG and cs.PF

Abstract: Matrix decompositions are ubiquitous in machine learning, including applications in dimensionality reduction, data compression and deep learning algorithms. Typical solutions for matrix decompositions have polynomial complexity which significantly increases their computational cost and time. In this work, we leverage efficient processing operations that can be run in parallel on modern Graphical Processing Units (GPUs), predominant computing architecture used e.g. in deep learning, to reduce the computational burden of computing matrix decompositions. More specifically, we reformulate the randomized decomposition problem to incorporate fast matrix multiplication operations (BLAS-3) as building blocks. We show that this formulation, combined with fast random number generators, allows to fully exploit the potential of parallel processing implemented in GPUs. Our extensive evaluation confirms the superiority of this approach over the competing methods and we release the results of this research as a part of the official CUDA implementation (https://docs.nvidia.com/cuda/cusolver/index.html).

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Łukasz Struski (37 papers)
  2. Przemysław Spurek (74 papers)
  3. Samuel Rodriguez Bernabeu (1 paper)
  4. Tomasz Trzciński (116 papers)
  5. Paweł Morkisz (4 papers)
Citations (3)

Summary

We haven't generated a summary for this paper yet.