Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication (1908.09606v3)

Published 26 Aug 2019 in cs.CC, cs.DC, and cs.PF

Abstract: We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes. The key idea behind COSMA is to derive an optimal (up to a factor of 0.03\% for 10MB of fast memory) sequential schedule and then parallelize it, preserving I/O optimality. To achieve this, we use the red-blue pebble game to precisely model MMM dependencies and derive a constructive and tight sequential and parallel I/O lower bound proofs. Compared to 2D or 3D algorithms, which fix processor decomposition upfront and then map it to the matrix dimensions, it reduces communication volume by up to $\sqrt{3}$ times. COSMA outperforms the established ScaLAPACK, CARMA, and CTF algorithms in all scenarios up to 12.8x (2.2x on average), achieving up to 88\% of Piz Daint's peak performance. Our work does not require any hand tuning and is maintained as an open source implementation.

Citations (87)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Tweets

https://twitter.com/t3840/status/1754776039199568155

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication (1908.09606v3)

Summary

Related Papers

Tweets