Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 45 tok/s
Gemini 2.5 Pro 54 tok/s Pro
GPT-5 Medium 22 tok/s Pro
GPT-5 High 20 tok/s Pro
GPT-4o 99 tok/s Pro
Kimi K2 183 tok/s Pro
GPT OSS 120B 467 tok/s Pro
Claude Sonnet 4 39 tok/s Pro
2000 character limit reached

Efficient Hybrid Execution of C++ Applications using Intel(R) Xeon Phi(TM) Coprocessor (1211.5530v1)

Published 23 Nov 2012 in cs.DC and cs.SE

Abstract: The introduction of Intel(R) Xeon Phi(TM) coprocessors opened up new possibilities in development of highly parallel applications. The familiarity and flexibility of the architecture together with compiler support integrated into the Intel C++ Composer XE allows the developers to use familiar programming paradigms and techniques, which are usually not suitable for other accelerated systems. It is now easy to use complex C++ template-heavy codes on the coprocessor, including for example the Intel Threading Building Blocks (TBB) parallelization library. These techniques are not only possible, but usually efficient as well, since host and coprocessor are of the same architectural family, making optimization techniques designed for the Xeon CPU also beneficial on Xeon Phi. As a result, highly optimized Xeon codes (like the TBB library) work well on both. In this paper we present a new parallel library construct, which makes it easy to apply a function to every member of an array in parallel, dynamically distributing the work between the host CPUs and one or more coprocessor cards. We describe the associated runtime support and use a physical simulation example to demonstrate that our library construct can be used to quickly create a C++ application that will significantly benefit from hybrid execution, simultaneously exploiting CPU cores and coprocessor cores. Experimental results show that one optimized source code is sufficient to make the host and the coprocessors run efficiently.

Citations (11)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.