SYNPA: SMT Performance Analysis and Allocation of Threads to Cores in ARM Processors (2310.12786v1)
Abstract: Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference within the core, which has a negative impact on individual application performance and can significantly increase the turnaround time of multi-program workloads. The severity of the interference effects depends on the competing co-runners sharing the core. Thus, it can be mitigated by applying a thread-to-core allocation policy that smartly selects applications to be run in the same core to minimize their interference. This paper presents SYNPA, a simple approach that dynamically allocates threads to cores in an SMT processor based on their run-time dynamic behavior. The approach uses a regression model to select synergistic pairs to mitigate intra-core interference. The main novelty of SYNPA is that it uses just three variables collected from the performance counters available in current ARM processors at the dispatch stage. Experimental results show that SYNPA outperforms the default Linux scheduler by around 36%, on average, in terms of turnaround time in 8-application workloads combining frontend bound and backend bound benchmarks.
- D. M. Tullsen, S. J. Eggers, and H. M. Levy, “Simultaneous multithreading: Maximizing on-chip parallelism,” in Proceedings of the 22nd Annual International Symposium on Computer Architecture, ser. ISCA ’95, 1995, p. 392–403. [Online]. Available: https://doi.org/10.1145/223982.224449
- R. Sugumar, M. Shah, and R. Ramirez, “Marvell thunderx3: Next-generation arm-based server processor,” IEEE Micro, vol. 41, no. 2, pp. 15–21, 2021.
- A. Yasin, “A top-down method for performance analysis and counters architecture,” in 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2014, pp. 35–44.
- J. Feliu, S. Eyerman, J. Sahuquillo, and S. Petit, “Symbiotic job scheduling on the ibm power8,” in 2016 IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2016, pp. 669–680.
- J. Feliu, S. Eyerman, J. Sahuquillo, S. Petit, and L. Eeckhout, “Improving ibm power8 performance through symbiotic job scheduling,” IEEE Transactions on Parallel and Distributed Systems, vol. 28, no. 10, pp. 2838–2851, 2017.
- A. Snavely and D. M. Tullsen, “Symbiotic jobscheduling for simultaneous multithreading processor,” in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000, pp. 234–244.
- C. Acosta, F. J. Cazorla, A. Ramirez, and M. Valero, “Thread to core assignment in SMT on-chip multiprocessors,” in International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), 2009, pp. 67–74.
- A. Settle, J. Kihm, A. Janiszewski, and D. Connors, “Architectural support for enhanced SMT job scheduling,” in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2004, pp. 63–73.
- J. Feliu, J. Sahuquillo, S. Petit, and J. Duato, “L1-bandwidth aware thread allocation in multicore smt processors,” in Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques, 2013, pp. 123–132.
- A. Vega, A. Buyuktosunoglu, and P. Bose, “Smt-centric power-aware thread placement in chip multiprocessors,” in Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT), 2013, pp. 167–176.
- P. Radojković, V. Čakarević, M. Moretó, J. Verdú, A. Pajuelo, F. J. Cazorla, M. Nemirovsky, and M. Valero, “Optimal task assignment in multithreaded processors: A statistical approach,” in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012, pp. 235–248.
- M. Navarro, L. Pons, and J. Sahuquillo, “Hy-sched: A simple hyperthreading-aware thread to core allocation strategy,” IEEE Computer Architecture Letters, vol. 20, no. 1, pp. 26–29, 2021.
- T. Moseley, J. Kihm, D. Connors, and D. Grunwald, “Methods for modeling resource contention on simultaneous multithreading processors,” in International Conference on Computer Design: VLSI in Computers and Processors, 2005, pp. 373–380.
- S. Eyerman and L. Eeckhout, “Probabilistic Job Symbiosis Modeling for SMT Processor Scheduling,” in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010, pp. 91–102.
- S. Eyerman, L. Eeckhout, T. Karkhanis, and J. E. Smith, “A performance counter architecture for computing accurate CPI components,” in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Oct. 2006, pp. 175–184.
- S. Eyerman and L. Eeckhout, “Per-thread cycle accounting in SMT processors,” in The International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), Mar. 2009, pp. 133–144.
- Y. Zhang, M. A. Laurenzano, J. Mars, and L. Tang, “SMiTe: Precise QoS prediction on real-system SMT processors to improve utilization in warehouse scale computers,” in International Symposium on Microarchitecture (MICRO), 2014, pp. 406–418.
- S. Eyerman, W. Heirman, K. Du Bois, and I. Hur, “Multi-stage cpi stacks,” IEEE Computer Architecture Letters, vol. 17, no. 1, pp. 55–58, 2018.
- ARM, “Armv8.1-m performance monitoring user guide,” 2020, version 1.186.
- J. Edmonds, “Maximum matching and a polyhedron with 0, 1-vertices,” Journal of research of the National Bureau of Standards B, vol. 69, no. 125-130, pp. 55–56, 1965.
- “ThunderX2 CN9975 - Cavium,” https://en.wikichip.org/wiki/cavium/thunderx2/cn9975, 2019, accessed: 2023-02-04.
- “Vulcan - Microarchitectures - Cavium,” https://en.wikichip.org/wiki/cavium/microarchitectures/vulcan?utm_content=cmp-true, 2019, accessed: 2023-03-28.
- S. Eyerman and L. Eeckhout, “System-level performance metrics for multiprogram workloads,” IEEE Micro, vol. 28, no. 3, pp. 42–53, 2008.
- Marta Navarro (1 paper)
- Josué Feliu (2 papers)
- Salvador Petit (3 papers)
- María E. Gómez (2 papers)
- Julio Sahuquillo (3 papers)