Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
149 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

MetaML: Automating Customizable Cross-Stage Design-Flow for Deep Learning Acceleration (2306.08746v1)

Published 14 Jun 2023 in cs.LG and cs.AR

Abstract: This paper introduces a novel optimization framework for deep neural network (DNN) hardware accelerators, enabling the rapid development of customized and automated design flows. More specifically, our approach aims to automate the selection and configuration of low-level optimization techniques, encompassing DNN and FPGA low-level optimizations. We introduce novel optimization and transformation tasks for building design-flow architectures, which are highly customizable and flexible, thereby enhancing the performance and efficiency of DNN accelerators. Our results demonstrate considerable reductions of up to 92\% in DSP usage and 89\% in LUT usage for two networks, while maintaining accuracy and eliminating the need for human effort or domain expertise. In comparison to state-of-the-art approaches, our design achieves higher accuracy and utilizes three times fewer DSP resources, underscoring the advantages of our proposed framework.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, pp. 436–444, 2015.
  2. C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong, “Optimizing FPGA-based accelerator design for deep convolutional neural networks,” in Proceedings of the 2015 ACM/SIGDA international symposium on field-programmable gate arrays, 2015, pp. 161–170.
  3. Z. Que, H. Nakahara, E. Nurvitadhi, A. Boutros, H. Fan, C. Zeng, J. Meng, K. H. Tsoi, X. Niu, and W. Luk, “Recurrent Neural Networks With Column-Wise Matrix–Vector Multiplication on FPGAs,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 30, no. 2, pp. 227–237, 2021.
  4. X. Zhang, H. Ye, J. Wang, Y. Lin, J. Xiong, W.-m. Hwu, and D. Chen, “DNNExplorer: a framework for modeling and exploring a novel paradigm of FPGA-based DNN accelerator,” in Proceedings of the 39th International Conference on Computer-Aided Design, 2020, pp. 1–9.
  5. V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer, “Efficient processing of deep neural networks: A tutorial and survey,” Proceedings of the IEEE, vol. 105, no. 12, pp. 2295–2329, 2017.
  6. C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T. K. Aarrestad, V. Loncar, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors,” Nature Machine Intelligence, vol. 3, no. 8, pp. 675–686, 2021.
  7. Z. Que, E. Wang, U. Marikar, E. Moreno, J. Ngadiuba, H. Javed, B. Borzyszkowski, T. Aarrestad, V. Loncar, S. Summers et al., “Accelerating recurrent neural networks for gravitational wave experiments,” in 2021 IEEE 32nd International Conference on Application-specific Systems, Architectures and Processors (ASAP).   IEEE, 2021, pp. 117–124.
  8. E. Wang, J. J. Davis, R. Zhao, H.-C. Ng, X. Niu, W. Luk, P. Y. Cheung, and G. A. Constantinides, “Deep neural network approximation for custom hardware: Where we’ve been, where we’re going,” ACM Computing Surveys (CSUR), vol. 52, no. 2, pp. 1–39, 2019.
  9. L. Deng, G. Li, S. Han, L. Shi, and Y. Xie, “Model compression and hardware acceleration for neural networks: A comprehensive survey,” Proceedings of the IEEE, vol. 108, no. 4, pp. 485–532, 2020.
  10. P. Xu, X. Zhang, C. Hao, Y. Zhao, Y. Zhang, Y. Wang, C. Li, Z. Guan, D. Chen, and Y. Lin, “AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs,” in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020, pp. 40–50.
  11. Y. Yang, Q. Huang, B. Wu, T. Zhang, L. Ma, G. Gambardella, M. Blott, L. Lavagno, K. Vissers, J. Wawrzynek et al., “Synetgy: Algorithm-hardware co-design for convnet accelerators on embedded fpgas,” in Proceedings of the 2019 ACM/SIGDA international symposium on field-programmable gate arrays, 2019, pp. 23–32.
  12. C. Hao, X. Zhang, Y. Li, S. Huang, J. Xiong, K. Rupnow, W.-m. Hwu, and D. Chen, “FPGA/DNN co-design: An efficient design methodology for 1ot intelligence on the edge,” in 2019 56th ACM/IEEE Design Automation Conference (DAC).   IEEE, 2019, pp. 1–6.
  13. C. Hao, Y. Chen, X. Zhang, Y. Li, J. Xiong, W.-m. Hwu, and D. Chen, “Effective algorithm-accelerator co-design for ai solutions on edge devices,” in Proceedings of the 2020 on Great Lakes Symposium on VLSI, 2020, pp. 283–290.
  14. W. Jiang, L. Yang, E. H.-M. Sha, Q. Zhuge, S. Gu, S. Dasgupta, Y. Shi, and J. Hu, “Hardware/software co-exploration of neural architectures,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 39, no. 12, pp. 4805–4815, 2020.
  15. Z. Dong, Y. Gao, Q. Huang, J. Wawrzynek, H. K. So, and K. Keutzer, “Hao: Hardware-aware neural architecture optimization for efficient inference,” in 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).   IEEE, 2021, pp. 50–59.
  16. H. Fan, M. Ferianc, Z. Que, H. Li, S. Liu, X. Niu, and W. Luk, “Algorithm and hardware co-design for reconfigurable cnn accelerator,” in 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC).   IEEE, 2022, pp. 250–255.
  17. X. Zhang, Y. Li, J. Pan, and D. Chen, “Algorithm/Accelerator Co-Design and Co-Search for Edge AI,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, no. 7, pp. 3064–3070, 2022.
  18. J. Ney, D. Loroch, V. Rybalkin, N. Weber, J. Krüger, and N. Wehn, “HALF: Holistic auto machine learning for FPGAs,” in 2021 31st International Conference on Field-Programmable Logic and Applications (FPL).   IEEE, 2021, pp. 363–368.
  19. M. S. Abdelfattah, Ł. Dudziak, T. Chau, R. Lee, H. Kim, and N. D. Lane, “Best of both worlds: Automl codesign of a cnn and its hardware accelerator,” in 2020 57th ACM/IEEE Design Automation Conference (DAC).   IEEE, 2020, pp. 1–6.
  20. V. Kathail, “Xilinx vitis unified software platform,” in Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2020, pp. 173–174.
  21. “OpenVINO toolkit,” https://github.com/openvinotoolkit/openvino, 2023.
  22. Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “FINN: A framework for fast, scalable binarized neural network inference,” in Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays, 2017, pp. 65–74.
  23. J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran et al., “Fast inference of deep neural networks in FPGAs for particle physics,” Journal of Instrumentation, vol. 13, no. 07, p. P07027, 2018.
  24. S. I. Venieris and C.-S. Bouganis, “fpgaConvNet: A framework for mapping convolutional neural networks on FPGAs,” in 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).   IEEE, 2016, pp. 40–47.
  25. J. Vandebon, J. G. F. Coutinho, W. Luk, E. Nurvitadhi, and T. Todman, “Artisan: a Meta-Programming Approach For Codifying Optimisation Strategies,” in 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), 2020.
  26. E. A. Moreno, O. Cerri, J. M. Duarte, H. B. Newman, T. Q. Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu, and J.-R. Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” The European Physical Journal C, vol. 80, no. 1, pp. 1–15, 2020.
  27. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  28. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770–778.
  29. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.
  30. Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, “Reading digits in natural images with unsupervised feature learning,” 2011.
  31. Y. Umuroglu, Y. Akhauri, N. J. Fraser, and M. Blott, “LogicNets: co-designed neural networks and circuits for extreme-throughput applications,” in 2020 30th International Conference on Field-Programmable Logic and Applications (FPL).   IEEE, 2020, pp. 291–297.
  32. Z. Que, M. Loo, H. Fan, M. Blott, M. Pierini, A. D. Tapper, and W. Luk, “LL-GNN: Low Latency Graph Neural Networks on FPGAs for Particle Detectors,” arXiv preprint arXiv:2209.14065, 2022.
  33. F. Wojcicki, Z. Que, A. D. Tapper, and W. Luk, “Accelerating Transformer Neural Networks on FPGAs for High Energy Physics Experiments,” in 2022 International Conference on Field-Programmable Technology (ICFPT).   IEEE, 2022, pp. 1–8.
  34. “Xilinx AI Engines and Their Applications,” in WP506(v1.1), July 10, 2020.
  35. M. Langhammer, E. Nurvitadhi, B. Pasca, and S. Gribok, “Stratix 10 NX Architecture and Applications,” in The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2021, pp. 57–67.
Citations (3)

Summary

We haven't generated a summary for this paper yet.