2000 character limit reached
Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets (2312.09401v1)
Published 14 Dec 2023 in cs.AR, cs.AI, and cs.DC
Abstract: To address increasing compute demand from recent multi-model workloads with heavy models like LLMs, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that comprehensively consider complex heterogeneity and inter-chiplet pipelining. Our experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency, compared to a monolithic accelerator with an optimized output-stationary dataflow.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.