Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations (2104.13248v1)

Published 27 Apr 2021 in cs.DC

Abstract: Computed Tomography (CT) is a key 3D imaging technology that fundamentally relies on the compute-intense back-projection operation to generate 3D volumes. GPUs are typically used for back-projection in production CT devices. However, with the rise of power-constrained micro-CT devices, and also the emergence of CPUs comparable in performance to GPUs, back-projection for CPUs could become favorable. Unlike GPUs, extracting parallelism for back-projection algorithms on CPUs is complex given that parallelism and locality are not explicitly defined and controlled by the programmer, as is the case when using CUDA for instance. We propose a collection of novel back-projection algorithms that reduce the arithmetic computation, robustly enable vectorization, enforce a regular memory access pattern, and maximize the data locality. We also implement the novel algorithms as efficient back-projection kernels that are performance portable over a wide range of CPUs. Performance evaluation using a variety of CPUs from different vendors and generations demonstrates that our back-projection implementation achieves on average 5.2x speedup over the multi-threaded implementation of the most widely used, and optimized, open library. With a state-of-the-art CPU, we reach performance that rivals top-performing GPUs.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Peng Chen (324 papers)
  2. Mohamed Wahib (38 papers)
  3. Xiao Wang (507 papers)
  4. Shinichiro Takizawa (4 papers)
  5. Takahiro Hirofuchi (7 papers)
  6. Hirotaka Ogawa (2 papers)
  7. Satoshi Matsuoka (33 papers)
Citations (2)

Summary

We haven't generated a summary for this paper yet.