Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 30 tok/s Pro
GPT-5 High 37 tok/s Pro
GPT-4o 98 tok/s Pro
Kimi K2 195 tok/s Pro
GPT OSS 120B 442 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis (2309.16022v1)

Published 27 Sep 2023 in cs.LG, cs.AR, and cs.PF

Abstract: With the ever-growing popularity of Graph Neural Networks (GNNs), efficient GNN inference is gaining tremendous attention. Field-Programming Gate Arrays (FPGAs) are a promising execution platform due to their fine-grained parallelism, low-power consumption, reconfigurability, and concurrent execution. Even better, High-Level Synthesis (HLS) tools bridge the gap between the non-trivial FPGA development efforts and rapid emergence of new GNN models. In this paper, we propose GNNHLS, an open-source framework to comprehensively evaluate GNN inference acceleration on FPGAs via HLS, containing a software stack for data generation and baseline deployment, and FPGA implementations of 6 well-tuned GNN HLS kernels. We evaluate GNNHLS on 4 graph datasets with distinct topologies and scales. The results show that GNNHLS achieves up to 50.8x speedup and 423x energy reduction relative to the CPU baselines. Compared with the GPU baselines, GNNHLS achieves up to 5.16x speedup and 74.5x energy reduction.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (28)
  1. S. Abi-Karam, Y. He, R. Sarkar, L. Sathidevi, Z. Qiao, and C. Hao, “GenGNN: A generic FPGA framework for graph neural network acceleration,” arXiv preprint arXiv:2201.08475, 2022.
  2. X. Bresson and T. Laurent, “Residual gated graph convnets,” arXiv preprint arXiv:1711.07553, 2017.
  3. N. Brown, “Exploring the acceleration of Nekbone on reconfigurable architectures,” in Proc. of IEEE/ACM Int’l Workshop on Heterogeneous High-performance Reconfigurable Computing, 2020, pp. 19–28.
  4. J. de Fine Licht, M. Besta, S. Meierhans, and T. Hoefler, “Transformations of high-level synthesis codes for high-performance computing,” IEEE Trans. Parallel Distrib. Syst., vol. 32, no. 5, pp. 1014–1029, 2020.
  5. A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. Royal Statistical Society: Series B (Methodological), vol. 39, no. 1, pp. 1–22, 1977.
  6. Z. Dong, W. Cao, M. Zhang, D. Tao, Y. Chen, and X. Zhang, “CktGNN: Circuit graph neural network for electronic design automation,” arXiv preprint arXiv:2308.16406, 2023.
  7. V. P. Dwivedi, C. K. Joshi, T. Laurent, Y. Bengio, and X. Bresson, “Benchmarking graph neural networks,” Journal of Machine Learning Research, vol. 23, 2020.
  8. M. Fey and J. E. Lenssen, “Fast graph representation learning with PyTorch Geometric,” arXiv preprint arXiv:1903.02428, 2019.
  9. T. Geng, A. Li, R. Shi, C. Wu, T. Wang, Y. Li, P. Haghi, A. Tumeo, S. Che, S. Reinhardt, and M. C. Herbordt, “AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing,” in Proc. of 53rd Int’l Symp. on Microarchitecture, 2020, pp. 922–936.
  10. W. Hamilton, Z. Ying, and J. Leskovec, “Inductive representation learning on large graphs,” Adv. Neural Inf. Process. Syst., vol. 30, 2017.
  11. W. Hu, M. Fey, M. Zitnik, Y. Dong, H. Ren, B. Liu, M. Catasta, and J. Leskovec, “Open graph benchmark: Datasets for machine learning on graphs,” Adv. Neural Inf. Process. Syst., vol. 33, 2020.
  12. T. N. Kipf and M. Welling, “Semi-supervised classification with graph convolutional networks,” in Proc. of Int’l Conf. on Learning Rep., 2017.
  13. M. Leeser, S. Handagala, and M. Zink, “FPGAs in the cloud,” Computing in Science & Engineering, vol. 23, no. 6, pp. 72–76, 2021.
  14. J. Leskovec and C. Faloutsos, “Sampling from large graphs,” in Proc. of 12th ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining, 2006, pp. 631–636.
  15. Y. C. Lin, B. Zhang, and V. Prasanna, “GCN inference acceleration using high-level synthesis,” in Proc. of IEEE High Performance Extreme Computing Conf., 2021.
  16. F. Monti, D. Boscaini, J. Masci, E. Rodola, J. Svoboda, and M. M. Bronstein, “Geometric deep learning on graphs and manifolds using mixture model CNNs,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 5115–5124.
  17. A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan et al., “PyTorch: An imperative style, high-performance deep learning library,” Adv. Neural Inf. Process. Syst., vol. 32, 2019.
  18. Y. S. Shao and D. Brooks, “ISA-independent workload characterization and its implications for specialized architectures,” in Proc. of IEEE Int’l Symp. on Perf. Analysis of Systems and Software, 2013, pp. 245–255.
  19. P. Veličković, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” in Proc. of Int’l Conf. on Learning Rep., 2017.
  20. M. Y. Wang, “Deep graph library: Towards efficient and scalable deep learning on graphs,” in Proc. of ICLR Workshop on Representation Learning on Graphs and Manifolds, 2019.
  21. J. Weinberg, M. O. McCracken, E. Strohmaier, and A. Snavely, “Quantifying locality in the memory access patterns of HPC applications,” in Proc. of ACM/IEEE Conf. on Supercomputing, 2005.
  22. B. Weisfeiler and A. Leman, “The reduction of a graph to canonical form and the algebra which appears therein,” Nauchno-Technicheskaya Informatsia, vol. 2, no. 9, pp. 12–16, 1968.
  23. K. Xu, W. Hu, J. Leskovec, and S. Jegelka, “How powerful are graph neural networks?” in Proc. of Int’l Conf. on Learning Rep., 2019.
  24. B. Zhang, R. Kannan, and V. Prasanna, “BoostGCN: A framework for optimizing GCN inference on FPGA,” in Proc. of 29th Int’l Symp. on Field-Programmable Custom Computing Machines, 2021, pp. 29–39.
  25. M. Zhang, Z. Cui, M. Neumann, and Y. Chen, “An end-to-end deep learning architecture for graph classification,” in Proc. of AAAI Conf. on Artificial Intelligence, vol. 32, no. 1, 2018.
  26. C. Zhao, R. D. Chamberlain, and X. Zhang, “Supercut: Communication-aware partitioning for near-memory graph processing,” in Proc. of 20th ACM Int’l Conf. on Computing Frontiers, 2023, pp. 44–53.
  27. C. Zhao, Z. Dong, Y. Chen, X. Zhang, and R. D. Chamberlain, “GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis,” in Proc. of 41st IEEE Int’l Conf. on Computer Design, Nov. 2023.
  28. ——, “Graph Neural Network High-Level Synthesis Benchmark Suite V1,” https://doi.org/10.7936/6RXS-103645, Sep. 2023.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Github Logo Streamline Icon: https://streamlinehq.com

GitHub