Papers
Topics
Authors
Recent
Search
2000 character limit reached

Fast and Practical Strassen's Matrix Multiplication using FPGAs

Published 4 Jun 2024 in cs.AR | (2406.02088v1)

Abstract: Matrix multiplication is a cornerstone operation in a wide array of scientific fields, including machine learning and computer graphics. The standard algorithm for matrix multiplication has a complexity of $\mathcal{O}(n3)$ for $n\times n$ matrices. Strassen's algorithm improves this to $\mathcal{O}(n{2.807})$, but its practicality is limited for small to medium matrix sizes due to the large number of additions it introduces. This paper presents a novel FPGA-based implementation of Strassen's algorithm that achieves superior speed over an optimized General Matrix Multiply (GeMM) implementation for matrices as small as $n=256$. Our design, tested extensively on two high-performance FPGA accelerators (Alveo U50 and U280) across various data types, matches or surpasses the performance of a highly optimized baseline across a range of matrix sizes.

Citations (1)

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Authors (3)

Collections

Sign up for free to add this paper to one or more collections.