TapirXLA: Embedding Fork-Join Parallelism into the XLA Compiler in TensorFlow Using Tapir (1908.11338v1)
Abstract: This work introduces TapirXLA, a replacement for TensorFlow's XLA compiler that embeds recursive fork-join parallelism into XLA's low-level representation of code. Machine-learning applications rely on efficient parallel processing to achieve performance, and they employ a variety of technologies to improve performance, including compiler technology. But compilers in machine-learning frameworks lack a deep understanding of parallelism, causing them to lose performance by missing optimizations on parallel computation. This work studies how Tapir, a compiler intermediate representation (IR) that embeds parallelism into a mainstream compiler IR, can be incorporated into a compiler for machine learning to remedy this problem. TapirXLA modifies the XLA compiler in TensorFlow to employ the Tapir/LLVM compiler to optimize low-level parallel computation. TapirXLA encodes the parallelism within high-level TensorFlow operations using Tapir's representation of fork-join parallelism. TapirXLA also exposes to the compiler implementations of linear-algebra library routines whose parallel operations are encoded using Tapir's representation. We compared the performance of TensorFlow using TapirXLA against TensorFlow using an unmodified XLA compiler. On four neural-network benchmarks, TapirXLA speeds up the parallel running time of the network by a geometric-mean multiplicative factor of 30% to 100%, across four CPU architectures.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.