2000 character limit reached
A 3D Parallel Algorithm for QR Decomposition (1805.05278v1)
Published 14 May 2018 in cs.DC
Abstract: Interprocessor communication often dominates the runtime of large matrix computations. We present a parallel algorithm for computing QR decompositions whose bandwidth cost (communication volume) can be decreased at the cost of increasing its latency cost (number of messages). By varying a parameter to navigate the bandwidth/latency tradeoff, we can tune this algorithm for machines with different communication costs.
- Grey Ballard (36 papers)
- James Demmel (54 papers)
- Laura Grigori (30 papers)
- Mathias Jacquelin (14 papers)
- Nicholas Knight (5 papers)