Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 154 tok/s
Gemini 2.5 Pro 37 tok/s Pro
GPT-5 Medium 21 tok/s Pro
GPT-5 High 23 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 169 tok/s Pro
GPT OSS 120B 347 tok/s Pro
Claude Sonnet 4.5 36 tok/s Pro
2000 character limit reached

A NUMA-Aware Provably-Efficient Task-Parallel Platform Based on the Work-First Principle (1806.11128v4)

Published 28 Jun 2018 in cs.DC and cs.PF

Abstract: Task parallelism is designed to simplify the task of parallel programming. When executing a task parallel program on modern NUMA architectures, it can fail to scale due to the phenomenon called work inflation, where the overall processing time that multiple cores spend on doing useful work is higher compared to the time required to do the same amount of work on one core, due to effects experienced only during parallel executions such as additional cache misses, remote memory accesses, and memory bandwidth issues. It's possible to mitigate work inflation by co-locating the computation with the data, but this is nontrivial to do with task parallel programs. First, by design, the scheduling for task parallel programs is automated, giving the user little control over where the computation is performed. Second, the platforms tend to employ work stealing, which provides strong theoretical guarantees, but its randomized protocol for load balancing does not discern between work items that are far away versus ones that are closer. In this work, we propose NUMA-WS, a NUMA-aware task parallel platform engineering based on the work-first principle. By abiding by the work-first principle, we are able to obtain a platform that is work efficient, provides the same theoretical guarantees as the classic work stealing scheduler, and mitigates work inflation. Furthermore, we implemented a prototype platform by modifying Intel's Cilk Plus runtime system and empirically demonstrate that the resulting system is work efficient and scalable.

Citations (4)

Summary

We haven't generated a summary for this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube