Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 28 tok/s Pro
GPT-5 High 27 tok/s Pro
GPT-4o 63 tok/s Pro
Kimi K2 212 tok/s Pro
GPT OSS 120B 426 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

CesASMe and Staticdeps: static detection of memory-carried dependencies for code analyzers (2402.14567v1)

Published 22 Feb 2024 in cs.PF

Abstract: A variety of code analyzers, such as IACA, uiCA, llvm-mca or Ithemal, strive to statically predict the throughput of a computation kernel. Each analyzer is based on its own simplified CPU model reasoning at the scale of a basic block. Facing this diversity, evaluating their strengths and weaknesses is important to guide both their usage and their enhancement. We present CesASMe, a fully-tooled solution to evaluate code analyzers on C-level benchmarks composed of a benchmark derivation procedure that feeds an evaluation harness. We conclude that memory-carried data dependencies are a major source of imprecision for these tools. We tackle this issue with staticdeps, a static analyzer extracting memory-carried data dependencies, including across loop iterations, from an assembly basic block. We integrate its output to uiCA, a state-of-the-art code analyzer, to evaluate staticdeps' impact on a code analyzer's precision through CesASMe.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (26)
  1. Andreas Abel and Jan Reineke. 2019a. nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems. arXiv e-prints abs/1911.03282 (2019). arXiv:1911.03282 http://arxiv.org/abs/1911.03282
  2. Andreas Abel and Jan Reineke. 2019b. uops.info: Characterizing Latency, Throughput, and Port Usage of Instructions on Intel Microarchitectures. In ASPLOS (Providence, RI, USA) (ASPLOS ’19). ACM, New York, NY, USA, 673–686. https://doi.org/10.1145/3297858.3304062
  3. Andreas Abel and Jan Reineke. 2022. UiCA: Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures. In Proceedings of the 36th ACM International Conference on Supercomputing (Virtual Event) (ICS ’22). Association for Computing Machinery, New York, NY, USA, Article 33, 14 pages. https://doi.org/10.1145/3524059.3532396
  4. AMD 2023. AMD64 Architecture Programmer’s Manual, volume 2. AMD.
  5. Adding Virtualization Capabilities to the Grid’5000 Testbed. In Cloud Computing and Services Science, Ivan I. Ivanov, Marten van Sinderen, Frank Leymann, and Tony Shan (Eds.). Communications in Computer and Information Science, Vol. 367. Springer International Publishing, 3–20. https://doi.org/10.1007/978-3-319-04519-1_1
  6. The Gem5 Simulator. SIGARCH Comput. Archit. News 39, 2 (aug 2011), 1–7. https://doi.org/10.1145/2024716.2024718
  7. PLuTo: A Practical and Fully Automatic Polyhedral Parallelizer and Locality Optimizer. Technical Report OSU-CISRC-10/07-TR70. The Ohio State University.
  8. BHive: A Benchmark Suite and Measurement Framework for Validating x86-64 Basic Block Performance Models. In 2019 IEEE International Symposium on Workload Characterization (IISWC). 167–177. https://doi.org/10.1109/IISWC47752.2019.9042166
  9. PALMED: Throughput Characterization for Superscalar Architectures. In 2022 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 106–117. https://doi.org/10.1109/CGO53902.2022.9741289
  10. Fabian Gruber. 2019. Performance Debugging Toolbox for Binaries: Sensitivity Analysis and Dependence Profiling. Ph. D. Dissertation. Université Grenoble Alpes. http://www.theses.fr/2019GREAM071 2019GREAM071.
  11. Intel Corporation. [n. d.]. Intel Architecture Code Analyzer (IACA). https://software.intel.com/en-us/articles/intel-architecture-code-analyzer/.
  12. Intel Corporation 2023. Intel® 64 and IA-32 Architectures Software Developer’s Manual, volume 1. Intel Corporation.
  13. Maurice G Kendall. 1938. A new measure of rank correlation. Biometrika 30, 1/2 (1938), 81–93.
  14. Automatic Throughput and Critical Path Analysis of x86 and ARM Assembly Kernels. In 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). 1–6. https://doi.org/10.1109/PMBS49563.2019.00006
  15. Linux Kernel. [n. d.]. perf: Linux profiling with performance counters. http://perf.wiki.kernel.org/index.php/Main_Page.
  16. Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks. CoRR abs/1808.07412 (2018). arXiv:1808.07412 http://arxiv.org/abs/1808.07412
  17. Nicholas Nethercote and Julian Seward. 2003. Valgrind: A Program Supervision Framework. Electr. Notes Theor. Comput. Sci. 89, 2 (2003), 44–66.
  18. Arthur Perais and André Seznec. 2014. Practical data value speculation for future high-end processors. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 428–439. https://doi.org/10.1109/HPCA.2014.6835952
  19. PoCC [n. d.]. PoCC, the Polyhedral Compiler Collection. https://www.cs.colostate.edu/~pouchet/software/pocc/.
  20. Louis-Noël Pouchet and Tomofumi Yuki. 2016. PolyBench/C: The polyhedral benchmark suite, version 4.2. http://polybench.sf.net.
  21. Nguyen Anh Quynh and the Capstone collaborators. [n. d.]. Capstone engine. https://www.capstone-engine.org/.
  22. Fabian Ritter and Sebastian Hack. 2022. AnICA: Analyzing Inconsistencies in Microarchitectural Code Analyzers. Proc. ACM Program. Lang. 6, OOPSLA2, Article 125 (oct 2022), 29 pages. https://doi.org/10.1145/3563288
  23. Load Value Prediction via Path-based Address Prediction: Avoiding Mispredictions due to Conflicting Stores. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 423–435.
  24. Sony Corporation and LLVM Project. [n. d.]. LLVM Machine Code Analyzer. https://llvm.org/docs/CommandGuide/llvm-mca.html.
  25. R. M. Tomasulo. 1967. An Efficient Algorithm for Exploiting Multiple Arithmetic Units. IBM Journal of Research and Development 11, 1 (1967), 25–33. https://doi.org/10.1147/rd.111.0025
  26. WikiChip. 2021. Intel Details Golden Cove: Next-Generation Big Core For Client and Server SoCs. https://fuse.wikichip.org/news/6111/intel-details-golden-cove-next-generation-big-core-for-client-and-server-socs/.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We found no open problems mentioned in this paper.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.