Papers
Topics
Authors
Recent
2000 character limit reached

Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU (2311.00368v1)

Published 1 Nov 2023 in cs.LG and cs.MS

Abstract: In this paper, we focus on three sparse matrix operations that are relevant for machine learning applications, namely, the sparse-dense matrix multiplication (SPMM), the sampled dense-dense matrix multiplication (SDDMM), and the composition of the SDDMM with SPMM, also termed as FusedMM. We develop optimized implementations for SPMM, SDDMM, and FusedMM operations utilizing Intel oneAPI's Explicit SIMD (ESIMD) SYCL extension API. In contrast to CUDA or SYCL, the ESIMD API enables the writing of explicitly vectorized kernel code. Sparse matrix algorithms implemented with the ESIMD API achieved performance close to the peak of the targeted Intel Data Center GPU. We compare our performance results to Intel's oneMKL library on Intel GPUs and to a recent CUDA implementation for the sparse matrix operations on NVIDIA's V100 GPU and demonstrate that our implementations for sparse matrix operations outperform either.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.