Papers
Topics
Authors
Recent
2000 character limit reached

Inter-Layer Scheduling Space Exploration for Multi-model Inference on Heterogeneous Chiplets

Published 14 Dec 2023 in cs.AR, cs.AI, and cs.DC | (2312.09401v1)

Abstract: To address increasing compute demand from recent multi-model workloads with heavy models like LLMs, we propose to deploy heterogeneous chiplet-based multi-chip module (MCM)-based accelerators. We develop an advanced scheduling framework for heterogeneous MCM accelerators that comprehensively consider complex heterogeneity and inter-chiplet pipelining. Our experiments using our framework on GPT-2 and ResNet-50 models on a 4-chiplet system have shown upto 2.2x and 1.9x increase in throughput and energy efficiency, compared to a monolithic accelerator with an optimized output-stationary dataflow.

Summary

Paper to Video (Beta)

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.