A Systematic Survey of General Sparse Matrix-Matrix Multiplication (2002.11273v3)
Abstract: General Sparse Matrix-Matrix Multiplication (SpGEMM) has attracted much attention from researchers in graph analyzing, scientific computing, and deep learning. Many optimization techniques have been developed for different applications and computing architectures over the past decades. The objective of this paper is to provide a structured and comprehensive overview of the researches on SpGEMM. Existing researches have been grouped into different categories based on target architectures and design choices. Covered topics include typical applications, compression formats, general formulations, key problems and techniques, architecture-oriented optimizations, and programming models. The rationales of different algorithms are analyzed and summarized. This survey sufficiently reveals the latest progress of SpGEMM research to 2021. Moreover, a thorough performance comparison of existing implementations is presented. Based on our findings, we highlight future research directions, which encourage better design and implementations in later studies.
- Kadir Akbudak and Cevdet Aykanat. 2014. Simultaneous Input and Output Matrix Partitioning for Outer-Product-Parallel Sparse Matrix-Matrix Multiplication. SIAM J. Sci. Comput. 36, 5 (2014), C568–C590. https://doi.org/10.1137/13092589X
- Kadir Akbudak and Cevdet Aykanat. 2017. Exploiting Locality in Sparse Matrix-Matrix Multiplication on Many-Core Architectures. IEEE Trans. Parallel Distributed Syst. 28, 8 (2017), 2258–2271. https://doi.org/10.1109/TPDS.2017.2656893
- Partitioning Models for Scaling Parallel Sparse Matrix-Matrix Multiplication. ACM Trans. Parallel Comput. 4, 3 (2018), 13:1–13:34. https://doi.org/10.1145/3155292
- Better Size Estimation for Sparse Matrix Products. Algorithmica 69, 3 (2014), 741–757. https://doi.org/10.1007/s00453-012-9692-9
- Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication. In Proceedings of the 2016 International Conference on Supercomputing, ICS 2016, Istanbul, Turkey, June 1-3, 2016, Ozcan Ozturk, Kemal Ebcioglu, Mahmut T. Kandemir, and Onur Mutlu (Eds.). ACM, 36:1–36:12. https://doi.org/10.1145/2925426.2926273
- OpenMP ARB. 2021. OpenMP: The OpenMP API specification for parallel programming. https://www.openmp.org/
- The Landscape of Parallel Computing Research: A View from Berkeley. Technical Report UCB/EECS-2006-183. EECS Department, University of California, Berkeley. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.html
- Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication. SIAM J. Sci. Comput. 38, 6 (2016), C624–C651. https://doi.org/10.1137/15M104253X
- Parallel Triangle Counting and Enumeration Using Matrix Algebra. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, IPDPS 2015, Hyderabad, India, May 25-29, 2015. IEEE Computer Society, 804–811. https://doi.org/10.1109/IPDPSW.2015.75
- Combinatorial BLAS 2.0: Scaling Combinatorial Algorithms on Distributed-Memory Systems. IEEE Trans. Parallel Distributed Syst. 33, 4 (2022), 989–1001. https://doi.org/10.1109/TPDS.2021.3094091
- Communication optimal parallel multiplication of sparse random matrices. In 25th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’13, Montreal, QC, Canada - July 23 - 25, 2013, Guy E. Blelloch and Berthold Vöcking (Eds.). ACM, 222–231. https://doi.org/10.1145/2486159.2486196
- Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication. ACM Trans. Parallel Comput. 3, 3 (2016), 18:1–18:34. https://doi.org/10.1145/3015144
- Reducing Communication Costs for Sparse Matrix Multiplication within Algebraic Multigrid. SIAM J. Sci. Comput. 38, 3 (2016), C203–C231. https://doi.org/10.1137/15M1028807
- Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods. SIAM J. Scientific Computing 34, 4 (2012), C123–C152. https://doi.org/10.1137/110838844
- Sparse matrix multiplication: The distributed block-compressed sparse row library. Parallel Comput. 40, 5-6 (2014), 47–58. https://doi.org/10.1016/j.parco.2014.03.012
- A multigrid tutorial, Second Edition. SIAM.
- Aydin Buluç and John R. Gilbert. 2008a. Challenges and Advances in Parallel Sparse Matrix-Matrix Multiplication. In 2008 International Conference on Parallel Processing, ICPP 2008, September 8-12, 2008, Portland, Oregon, USA. IEEE Computer Society, 503–510. https://doi.org/10.1109/ICPP.2008.45
- Aydin Buluç and John R. Gilbert. 2008b. On the representation and multiplication of hypersparse matrices. In 22nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, Miami, Florida USA, April 14-18, 2008. IEEE, 1–11. https://doi.org/10.1109/IPDPS.2008.4536313
- Aydin Buluç and John R. Gilbert. 2010. Highly Parallel Sparse Matrix-Matrix Multiplication. CoRR abs/1006.2183 (2010). arXiv:1006.2183 http://arxiv.org/abs/1006.2183
- Aydin Buluç and John R. Gilbert. 2011. The Combinatorial BLAS: design, implementation, and applications. International Journal of High Performance Computing Applications 25, 4 (2011), 496–509. https://doi.org/10.1177/1094342011403516
- Aydin Buluç and John R. Gilbert. 2012. Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments. SIAM J. Sci. Comput. 34, 4 (2012), C170–C191. https://doi.org/10.1137/110848244
- Aydin Buluç and Kamesh Madduri. 2011. Parallel breadth-first search on distributed memory systems. In Conference on High Performance Computing Networking, Storage and Analysis, SC 2011, Seattle, WA, USA, November 12-18, 2011, Scott A. Lathrop, Jim Costa, and William Kramer (Eds.). ACM, 65:1–65:12. https://doi.org/10.1145/2063384.2063471
- Lynn Elliot Cannon. 1969. A cellular computer to implement the Kalman filter algorithm. Montana State University.
- Hypergraph Partitioning. In Encyclopedia of Parallel Computing, David Padua (Ed.). Springer US, Boston, MA, 871–881. https://doi.org/10.1007/978-0-387-09766-4_1
- Algebraic Methods in the Congested Clique. In Proceedings of the 2015 ACM Symposium on Principles of Distributed Computing, PODC 2015, Donostia-San Sebastián, Spain, July 21 - 23, 2015, Chryssis Georgiou and Paul G. Spirakis (Eds.). ACM, 143–152. https://doi.org/10.1145/2767386.2767414
- Timothy M. Chan. 2007. More Algorithms for All-pairs Shortest Paths in Weighted Graphs. In Proceedings of the Thirty-ninth Annual ACM Symposium on Theory of Computing (San Diego, California, USA) (STOC ’07). ACM, New York, NY, USA, 590–598. https://doi.org/10.1145/1250790.1250877
- Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE J. Solid State Circuits 52, 1 (2017), 127–138. https://doi.org/10.1109/JSSC.2016.2616357
- Performance-Aware Model for Sparse Matrix-Matrix Multiplication on the Sunway TaihuLight Supercomputer. IEEE Trans. Parallel Distributed Syst. 30, 4 (2019), 923–938. https://doi.org/10.1109/TPDS.2018.2871189
- Optimizing partitioned CSR-based SpGEMM on the Sunway TaihuLight. Neural Comput. Appl. 32, 10 (2020), 5571–5582. https://doi.org/10.1007/s00521-019-04121-z
- Edith Cohen. 1997. Size-Estimation Framework with Applications to Transitive Closure and Reachability. J. Comput. Syst. Sci. 55, 3 (1997), 441–453. https://doi.org/10.1006/jcss.1997.1534
- Edith Cohen. 1998. Structure Prediction and Computation of Sparse Matrix Products. J. Comb. Optim. 2, 4 (1998), 307–332. https://doi.org/10.1023/A:1009716300509
- MAD Skills: New Analysis Practices for Big Data. Proc. VLDB Endow. 2, 2 (2009), 1481–1492. https://doi.org/10.14778/1687553.1687576
- Jonathan D. Cohen. 2009. Graph Twiddling in a MapReduce World. Comput. Sci. Eng. 11, 4 (2009), 29–41. https://doi.org/10.1109/MCSE.2009.120
- Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. http://cusplibrary.github.io/ , v0.5.0.
- Optimizing Sparse Matrix - Matrix Multiplication for the GPU. ACM Trans. Math. Softw. 41, 4 (2015), 25:1–25:20. https://doi.org/10.1145/2699470
- Timothy A. Davis. 2018. Graph algorithms via SuiteSparse: GraphBLAS: triangle counting and K-truss. In 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018, Waltham, MA, USA, September 25-27, 2018. IEEE, 1–6. https://doi.org/10.1109/HPEC.2018.8547538
- Timothy A. Davis. 2019. Algorithm 1000: SuiteSparse: GraphBLAS: Graph Algorithms in the Language of Sparse Linear Algebra. ACM Trans. Math. Softw. 45, 4 (2019), 44:1–44:25. https://doi.org/10.1145/3322125
- Timothy A. Davis and Yifan Hu. 2011. The university of Florida sparse matrix collection. ACM Trans. Math. Softw. 38, 1 (2011), 1:1–1:25. https://doi.org/10.1145/2049662.2049663
- Gunduz Vehbi Demirci and Cevdet Aykanat. 2020a. Cartesian Partitioning Models for 2D and 3D Parallel SpGEMM Algorithms. IEEE Trans. Parallel Distributed Syst. 31, 12 (2020), 2763–2775. https://doi.org/10.1109/TPDS.2020.3000708
- Gunduz Vehbi Demirci and Cevdet Aykanat. 2020b. Scaling sparse matrix-matrix multiplication in the accumulo database. Distributed Parallel Databases 38, 1 (2020), 31–62. https://doi.org/10.1007/s10619-019-07257-y
- Julien Demouth. 2012. Sparse matrix-matrix multiplication on the GPU. In GPU Technology Conference 2012.
- Parallel Graph Coloring for Manycore Architectures. In 2016 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2016, Chicago, IL, USA, May 23-27, 2016. IEEE Computer Society, 892–901. https://doi.org/10.1109/IPDPS.2016.54
- Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments. CoRR abs/1804.00695 (2018). arXiv:1804.00695 http://arxiv.org/abs/1804.00695
- Hypergraph Sparsification and Its Application to Partitioning. In 42nd International Conference on Parallel Processing, ICPP 2013, Lyon, France, October 1-4, 2013. IEEE Computer Society, 200–209. https://doi.org/10.1109/ICPP.2013.29
- Performance-Portable Sparse Matrix-Matrix Multiplication for Many-Core Architectures. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2017, Orlando / Buena Vista, FL, USA, May 29 - June 2, 2017. 693–702. https://doi.org/10.1109/IPDPSW.2017.8
- Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures. CoRR abs/1801.03065 (2018). arXiv:1801.03065 http://arxiv.org/abs/1801.03065
- Kokkos: Enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distributed Comput. 74, 12 (2014), 3202–3216. https://doi.org/10.1016/j.jpdc.2014.07.003
- James J Elliott and Christopher M Siefert. 2018. Low Thread-Count Gustavson: A Multithreaded Algorithm for Sparse Matrix-Matrix Multiplication Using Perfect Hashing. In 2018 IEEE/ACM 9th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (scalA). 57–64. https://doi.org/10.1109/ScalA.2018.00011
- Robert D. Falgout. 2006. An Introduction to Algebraic Multigrid. Comput. Sci. Eng. 8, 6 (2006), 24–33. https://doi.org/10.1109/MCSE.2006.105
- Sparse Matrix-Vector Multiplication on GPGPUs. ACM Trans. Math. Softw. 43, 4 (2017), 30:1–30:49. https://doi.org/10.1145/3017994
- Graphulo: Linear Algebra Graph Kernels for NoSQL Databases. CoRR abs/1508.07372 (2015). arXiv:1508.07372 http://arxiv.org/abs/1508.07372
- Sparse Matrices in MATLAB: Design and Implementation. SIAM J. Matrix Anal. Appl. 13, 1 (1992), 333–356. https://doi.org/10.1137/0613024
- High-Performance Graph Algorithms from Parallel Sparse Matrices. In Applied Parallel Computing. State of the Art in Scientific Computing, 8th International Workshop, PARA 2006, Umeå, Sweden, June 18-21, 2006, Revised Selected Papers (Lecture Notes in Computer Science, Vol. 4699), Bo Kågström, Erik Elmroth, Jack J. Dongarra, and Jerzy Wasniewski (Eds.). Springer, 260–269. https://doi.org/10.1007/978-3-540-75755-9_32
- A Unified Framework for Numerical and Combinatorial Computing. Comput. Sci. Eng. 10, 2 (2008), 20–25. https://doi.org/10.1109/MCSE.2008.45
- SparTen: A Sparse Tensor Accelerator for Convolutional Neural Networks. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. ACM, 151–165. https://doi.org/10.1145/3352460.3358291
- Graphegon. 2021. Pygraphblas. https://github.com/Graphegon/pygraphblas. Online; accessed 8 July 2022.
- GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging. SIAM J. Sci. Comput. 37, 1 (2015). https://doi.org/10.1137/130948811
- Bandwidth Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation Blocking. In SPAA ’20: 32nd ACM Symposium on Parallelism in Algorithms and Architectures, Virtual Event, USA, July 15-17, 2020, Christian Scheideler and Michael Spear (Eds.). ACM, 293–303. https://doi.org/10.1145/3350755.3400216
- BELLA: Berkeley Efficient Long-Read to Long-Read Aligner and Overlapper. In Proceedings of the 2021 SIAM Conference on Applied and Computational Discrete Algorithms, ACDA 2021, Virtual Conference, July 19-21, 2021, Michael Bender, John Gilbert, Bruce Hendrickson, and Blair D. Sullivan (Eds.). SIAM, 123–134. https://doi.org/10.1137/1.9781611976830.12
- Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly. In 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. IEEE, 517–526. https://doi.org/10.1109/IPDPS49936.2021.00060
- Fred G. Gustavson. 1978. Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition. ACM Trans. Math. Softw. 4, 3 (1978), 250–269. https://doi.org/10.1145/355791.355796
- FP-AMG: FPGA-Based Acceleration Framework for Algebraic Multigrid Solvers. In 28th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2020, Fayetteville, AR, USA, May 3-6, 2020. IEEE, 148–156. https://doi.org/10.1109/FCCM48280.2020.00028
- EIE: Efficient Inference Engine on Compressed Deep Neural Network. In 43rd ACM/IEEE Annual International Symposium on Computer Architecture, ISCA 2016, Seoul, South Korea, June 18-22, 2016. IEEE Computer Society, 243–254. https://doi.org/10.1109/ISCA.2016.30
- ExTensor: An Accelerator for Sparse Tensor Algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. ACM, 319–333. https://doi.org/10.1145/3352460.3358275
- An overview of the Trilinos project. ACM Trans. Math. Softw. 31, 3 (2005), 397–423. https://doi.org/10.1145/1089014.1089021
- Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale. In 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. IEEE, 90–100. https://doi.org/10.1109/IPDPS49936.2021.00018
- Graphulo implementation of server-side sparse matrix multiply in the Accumulo database. In 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015, Waltham, MA, USA, September 15-17, 2015. IEEE, 1–7. https://doi.org/10.1109/HPEC.2015.7322448
- Intel. 2021. Intel Math Kernel Library. https://software.intel.com/en-us/mkl
- Performance Evaluation of Accurate Matrix-Matrix Multiplication on GPU Using Sparse Matrix Multiplications. In Eighth International Symposium on Computing and Networking Workshops, CANDAR 2020 Workshops, Naha, Japan, November 24-27, 2020. IEEE, 178–184. https://doi.org/10.1109/CANDARW51189.2020.00044
- The Algorithms for FPGA Implementation of Sparse Matrices Multiplication. Comput. Informatics 33, 3 (2014), 667–684. http://www.cai.sk/ojs/index.php/cai/article/view/2795
- Dejiang Jin and Sotirios G. Ziavras. 2004. A Super-Programming Technique for Large Sparse Matrix Multiplication on PC Clusters. IEICE Trans. Inf. Syst. 87-D, 7 (2004), 1774–1781. http://search.ieice.org/bin/summary.php?id=e87-d_7_1774
- SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. ACM, 600–614. https://doi.org/10.1145/3352460.3358286
- Colored intersection searching via sparse rectangular matrix multiplication. In Proceedings of the 22nd ACM Symposium on Computational Geometry, Sedona, Arizona, USA, June 5-7, 2006, Nina Amenta and Otfried Cheong (Eds.). ACM, 52–60. https://doi.org/10.1145/1137856.1137866
- Barbara Ann Kitchenham. 2004. Procedures for Performing Systematic Reviews. Technical Report. Keele University, Department of Computer Science, Keele University, Kelee, UK. http://www.it.hiof.no/~haraldh/misc/2016-08-22-smat/Kitchenham-Systematic-Review-2004.pdf
- Characterization of Data Movement Requirements for Sparse Matrix Computations on GPUs. In 24th IEEE International Conference on High Performance Computing, HiPC 2017, Jaipur, India, December 18-21, 2017. IEEE Computer Society, 283–293. https://doi.org/10.1109/HiPC.2017.00040
- Ralf Lämmel. 2008. Google’s MapReduce programming model - Revisited. Sci. Comput. Program. 70, 1 (2008), 1–30. https://doi.org/10.1016/j.scico.2007.07.001
- Optimization of GPU-based Sparse Matrix Multiplication for Large Sparse Networks. In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020. IEEE, 925–936. https://doi.org/10.1109/ICDE48307.2020.00085
- Generalized Sparse Matrix-Matrix Multiplication for Vector Engines and Graph Applications. In 2019 IEEE/ACM Workshop on Memory Centric High Performance Computing, MCHPC@SC 2019, Denver, CO, USA, November 18, 2019. IEEE, 33–42. https://doi.org/10.1109/MCHPC49590.2019.00012
- Design space exploration for sparse matrix-matrix multiplication on FPGAs. Int. J. Circuit Theory Appl. 41, 2 (2013), 205–219. https://doi.org/10.1002/cta.796
- Register-Aware Optimizations for Parallel Sparse Matrix-Matrix Multiplication. Int. J. Parallel Program. 47, 3 (2019), 403–417. https://doi.org/10.1007/s10766-018-0604-8
- Sparta: high-performance, element-wise sparse tensor contraction on heterogeneous memory. In PPoPP ’21: 26th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, Republic of Korea, February 27- March 3, 2021, Jaejin Lee and Erez Petrank (Eds.). ACM, 318–333. https://doi.org/10.1145/3437801.3441581
- Weifeng Liu and Brian Vinter. 2014. An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium, Phoenix, AZ, USA, May 19-23, 2014. IEEE Computer Society, 370–381. https://doi.org/10.1109/IPDPS.2014.47
- Weifeng Liu and Brian Vinter. 2015. A framework for general sparse matrix-matrix multiplication on GPUs and heterogeneous processors. J. Parallel Distributed Comput. 85 (2015), 47–61. https://doi.org/10.1016/j.jpdc.2015.06.010
- Sparse matrix-matrix multiplication on modern architectures. In 19th International Conference on High Performance Computing, HiPC 2012, Pune, India, December 18-22, 2012. IEEE Computer Society, 1–10. https://doi.org/10.1109/HiPC.2012.6507483
- Duane Merrill and Andrew S. Grimshaw. 2011. High Performance and Scalable Radix Sorting: a Case Study of Implementing Dynamic Parallelism for GPU Computing. Parallel Process. Lett. 21, 2 (2011), 245–272. https://doi.org/10.1142/S0129626411000187
- Performance optimization, modeling and analysis of sparse matrix-matrix products on multi-core and many-core processors. Parallel Comput. 90 (2019). https://doi.org/10.1016/j.parco.2019.102545
- High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU. In 46th International Conference on Parallel Processing, ICPP 2017, Bristol, United Kingdom, August 14-17, 2017. IEEE Computer Society, 101–110. https://doi.org/10.1109/ICPP.2017.19
- AmgX: A Library for GPU Accelerated Algebraic Multigrid and Preconditioned Iterative Methods. SIAM J. Sci. Comput. 37, 5 (2015), S602–S626. https://doi.org/10.1137/140980260
- TileSpGEMM: a tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs. In PPoPP ’22: 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Seoul, Republic of Korea, April 2 - 6, 2022, Jaejin Lee, Kunal Agrawal, and Michael F. Spear (Eds.). ACM, 90–106. https://doi.org/10.1145/3503221.3508431
- NVIDIA. 2021. Nvidia cuSPARSE library. https://developer.nvidia.com/cusparse
- OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2018, Vienna, Austria, February 24-28, 2018. IEEE Computer Society, 724–736. https://doi.org/10.1109/HPCA.2018.00067
- spECK: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis. In PPoPP ’20: 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, San Diego, California, USA, February 22-26, 2020, Rajiv Gupta and Xipeng Shen (Eds.). ACM, 362–375. https://doi.org/10.1145/3332466.3374521
- AutoRelax: HW-SW Co-Optimization for Efficient SpGEMM Operations With Automated Relaxation in Deep Learning. IEEE Trans. Emerg. Top. Comput. 10, 3 (2022), 1428–1442. https://doi.org/10.1109/TETC.2021.3089848
- Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms. In High Performance Computing - 30th International Conference, ISC High Performance 2015, Frankfurt, Germany, July 12-16, 2015, Proceedings (Lecture Notes in Computer Science, Vol. 9137), Julian M. Kunkel and Thomas Ludwig (Eds.). Springer, 48–57. https://doi.org/10.1007/978-3-319-20119-1_4
- MaxNVM: Maximizing DNN Storage Density and Inference Efficiency with Sparse Encoding and Error Mitigation. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2019, Columbus, OH, USA, October 12-16, 2019. ACM, 769–781. https://doi.org/10.1145/3352460.3358258
- Chuck Pheatt. 2008. Intel® threading building blocks. Journal of Computing Sciences in Colleges 23, 4 (2008), 298–298.
- Extending Sparse Tensor Accelerators to Support Multiple Compression Formats. In 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. IEEE, 1014–1024. https://doi.org/10.1109/IPDPS49936.2021.00110
- SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2020, San Diego, CA, USA, February 22-26, 2020. IEEE, 58–70. https://doi.org/10.1109/HPCA47549.2020.00015
- Kokkos Kernels: Performance Portable Sparse/Dense Linear Algebra and Graph Kernels. CoRR abs/2103.11991 (2021). arXiv:2103.11991 https://arxiv.org/abs/2103.11991
- CiM3D: Comparator-in-Memory Designs Using Monolithic 3-D Technology for Accelerating Data-Intensive Applications. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits 7, 1 (2021), 79–87. https://doi.org/10.1109/JXCDC.2021.3087745
- Monolithic 3D+-IC Based Massively Parallel Compute-in-Memory Macro for Accelerating Database and Machine Learning Primitives. In 2020 IEEE International Electron Devices Meeting (IEDM). 28.5.1–28.5.4. https://doi.org/10.1109/IEDM13553.2020.9372111
- A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication. In HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, Virtual Event, Republic of Korea, January 20-21, 2021, Soonwook Hwang and Heon Young Yeom (Eds.). ACM, 110–119. https://doi.org/10.1145/3432261.3432271
- Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture. In 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. IEEE, 58–67. https://doi.org/10.1109/IPDPS49936.2021.00015
- Emanuel H. Rubensson and Elias Rudberg. 2014. Chunks and Tasks: A programming model for parallelization of dynamic algorithms. Parallel Comput. 40, 7 (2014), 328–343. https://doi.org/10.1016/j.parco.2013.09.006
- Emanuel H. Rubensson and Elias Rudberg. 2016. Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model. Parallel Comput. 57 (2016), 87–106. https://doi.org/10.1016/j.parco.2016.06.005
- PyTrilinos: High-performance distributed-memory solvers for Python. ACM Trans. Math. Softw. 34, 2 (2008), 7:1–7:33. https://doi.org/10.1145/1326548.1326549
- Locality-aware and load-balanced static task scheduling for MapReduce. Future Gener. Comput. Syst. 90 (2019), 49–61. https://doi.org/10.1016/j.future.2018.06.035
- Distributed many-to-many protein sequence alignment using sparse matrices. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2020, Virtual Event / Atlanta, Georgia, USA, November 9-19, 2020, Christine Cuicchi, Irene Qualters, and William T. Kramer (Eds.). IEEE/ACM, 75. https://doi.org/10.1109/SC41405.2020.00079
- Optimizing High Performance Markov Clustering for Pre-Exascale Architectures. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS), New Orleans, LA, USA, May 18-22, 2020. IEEE, 116–126. https://doi.org/10.1109/IPDPS47924.2020.00022
- Kaustubh Shivdikar. 2021. SMASH: Sparse Matrix Atomic Scratchpad Hashing. CoRR abs/2105.14156 (2021). arXiv:2105.14156 https://arxiv.org/abs/2105.14156
- Efficient sparse matrix-matrix multiplication on heterogeneous high performance systems. In 2010 IEEE International Conference On Cluster Computing Workshops and Posters (CLUSTER WORKSHOPS). 1–8. https://doi.org/10.1109/CLUSTERWKSP.2010.5613109
- A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel Distributed Comput. 74, 12 (2014), 3176–3190. https://doi.org/10.1016/j.jpdc.2014.06.002
- Synergistic CPU-FPGA Acceleration of Sparse Linear Algebra. CoRR abs/2004.13907 (2020). arXiv:2004.13907 https://arxiv.org/abs/2004.13907
- MetaStrider: Architectures for Scalable Memory-centric Reduction of Sparse Data Streams. ACM Trans. Archit. Code Optim. 16, 4 (2020), 35:1–35:26. https://doi.org/10.1145/3355396
- MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product. In 53rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 2020, Athens, Greece, October 17-21, 2020. IEEE, 766–780. https://doi.org/10.1109/MICRO50266.2020.00068
- The More the Merrier: Efficient Multi-Source Graph Traversal. Proc. VLDB Endow. 8, 4 (2014), 449–460. https://doi.org/10.14778/2735496.2735507
- The Trilinos Project Team. 2020. The Trilinos Home Page. https://trilinos.github.io. Online; (acccessed July 8, 2022).
- Robert A. van de Geijn and Jerrell Watts. 1997. SUMMA: scalable universal matrix multiplication algorithm. Concurr. Pract. Exp. 9, 4 (1997), 255–274. https://doi.org/10.1002/(SICI)1096-9128(199704)9:4<255::AID-CPE250>3.0.CO;2-2
- Finding heaviest H-subgraphs in real weighted graphs, with applications. CoRR abs/cs/0609009 (2006). arXiv:cs/0609009 http://arxiv.org/abs/cs/0609009
- Accelerating DNN Inference with GraphBLAS and the GPU. In 2019 IEEE High Performance Extreme Computing Conference, HPEC 2019, Waltham, MA, USA, September 24-26, 2019. IEEE, 1–6. https://doi.org/10.1109/HPEC.2019.8916498
- Semiempirical Molecular Dynamics (SEMD) I: Midpoint-Based Parallel Sparse Matrix-Matrix Multiplication Algorithm for Matrices with Decay. Journal of Chemical Theory and Computation 11, 7 (2015), 3145–3152. https://doi.org/10.1021/acs.jctc.5b00382 arXiv:https://doi.org/10.1021/acs.jctc.5b00382 PMID: 26575751.
- Adaptive sparse matrix-matrix multiplication on the GPU. In Proceedings of the 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2019, Washington, DC, USA, February 16-20, 2019, Jeffrey K. Hollingsworth and Idit Keidar (Eds.). ACM, 68–81. https://doi.org/10.1145/3293883.3295701
- A task-based linear algebra Building Blocks approach for scalable graph analytics. In 2015 IEEE High Performance Extreme Computing Conference, HPEC 2015, Waltham, MA, USA, September 15-17, 2015. IEEE, 1–6. https://doi.org/10.1109/HPEC.2015.7322450
- Fast linear algebra-based triangle counting with KokkosKernels. In 2017 IEEE High Performance Extreme Computing Conference, HPEC 2017, Waltham, MA, USA, September 12-14, 2017. IEEE, 1–7. https://doi.org/10.1109/HPEC.2017.8091043
- Scaling Sparse Matrix Multiplication on CPU-GPU Nodes. In 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. IEEE, 392–401. https://doi.org/10.1109/IPDPS49936.2021.00047
- Jiaming Xie and Yun Liang. 2019. SPART: Optimizing CNNs by Utilizing Both Sparsity of Weights and Feature Maps. In Advanced Parallel Processing Technologies - 13th International Symposium, APPT 2019, Tianjin, China, August 15-16, 2019, Proceedings (Lecture Notes in Computer Science, Vol. 11719), Pen-Chung Yew, Per Stenström, Junjie Wu, Xiaoli Gong, and Tao Li (Eds.). Springer, 71–85. https://doi.org/10.1007/978-3-030-29611-7_6
- Fast Triangle Counting Using Cilk. In 2018 IEEE High Performance Extreme Computing Conference, HPEC 2018, Waltham, MA, USA, September 25-27, 2018. IEEE, 1–7. https://doi.org/10.1109/HPEC.2018.8547563
- Raphael Yuster and Uri Zwick. 2005. Fast sparse matrix multiplication. ACM Trans. Algorithms 1, 1 (2005), 2–13. https://doi.org/10.1145/1077464.1077466
- Accelerating sparse matrix-matrix multiplication with GPU Tensor Cores. Comput. Electr. Eng. 88 (2020), 106848. https://doi.org/10.1016/j.compeleceng.2020.106848
- Performance evaluation and analysis of sparse matrix and graph kernels on heterogeneous processors. CCF Trans. High Perform. Comput. 1, 2 (2019), 131–143. https://doi.org/10.1007/s42514-019-00008-6
- Gamma: leveraging Gustavson’s algorithm to accelerate sparse matrix multiplication. In ASPLOS ’21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual Event, USA, April 19-23, 2021, Tim Sherwood, Emery D. Berger, and Christos Kozyrakis (Eds.). ACM, 687–701. https://doi.org/10.1145/3445814.3446702
- A novel algorithm for all pairs shortest path problem based on matrix multiplication and pulse coupled neural network. Digit. Signal Process. 21, 4 (2011), 517–521. https://doi.org/10.1016/j.dsp.2011.02.004
- SpArch: Efficient Architecture for Sparse Matrix Multiplication. In IEEE International Symposium on High Performance Computer Architecture, HPCA 2020, San Diego, CA, USA, February 22-26, 2020. IEEE, 261–274. https://doi.org/10.1109/HPCA47549.2020.00030
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.