Relax: Composable Abstractions for End-to-End Dynamic Machine Learning (2311.02103v1)
Abstract: Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging LLMs. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on LLMs show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.
- Ruihang Lai (9 papers)
- Junru Shao (11 papers)
- Siyuan Feng (55 papers)
- Steven S. Lyubomirsky (1 paper)
- Bohan Hou (9 papers)
- Wuwei Lin (5 papers)
- Zihao Ye (16 papers)
- Hongyi Jin (6 papers)
- Yuchen Jin (5 papers)
- Jiawei Liu (156 papers)
- Lesheng Jin (3 papers)
- Yaxing Cai (2 papers)
- Ziheng Jiang (23 papers)
- Yong Wu (56 papers)
- Sunghyun Park (38 papers)
- Prakalp Srivastava (2 papers)
- Jared G. Roesch (1 paper)
- Todd C. Mowry (10 papers)
- Tianqi Chen (77 papers)