Pushing Large Language Models to the 6G Edge: Vision, Challenges, and Opportunities (2309.16739v4)
Abstract: LLMs, which have shown remarkable capabilities, are revolutionizing AI development and potentially shaping our future. However, given their multimodality, the status quo cloud-based deployment faces some critical challenges: 1) long response time; 2) high bandwidth costs; and 3) the violation of data privacy. 6G mobile edge computing (MEC) systems may resolve these pressing issues. In this article, we explore the potential of deploying LLMs at the 6G edge. We start by introducing killer applications powered by multimodal LLMs, including robotics and healthcare, to highlight the need for deploying LLMs in the vicinity of end users. Then, we identify the critical challenges for LLM deployment at the edge and envision the 6G MEC architecture for LLMs. Furthermore, we delve into two design aspects, i.e., edge training and edge inference for LLMs. In both aspects, considering the inherent resource limitations at the edge, we discuss various cutting-edge techniques, including split learning/inference, parameter-efficient fine-tuning, quantization, and parameter-sharing inference, to facilitate the efficient deployment of LLMs. This article serves as a position paper for thoroughly identifying the motivation, challenges, and pathway for empowering LLMs at the 6G edge.
- J. Wei, Y. Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud, D. Yogatama, M. Bosma, D. Zhou, D. Metzler et al., “Emergent Abilities of Large Language Models,” arXiv preprint arXiv:2206.07682, Jun. 2022.
- D. Driess, F. Xia, M. S. Sajjadi, C. Lynch, A. Chowdhery, B. Ichter, A. Wahid, J. Tompson, Q. Vuong, T. Yu et al., “Palm-e: An Embodied Multimodal Language Model,” arXiv preprint arXiv:2303.03378, Mar. 2023.
- M. Moor, O. Banerjee, Z. S. H. Abad, H. M. Krumholz, J. Leskovec, E. J. Topol, and P. Rajpurkar, “Foundation Models for Generalist Medical Artificial Intelligence,” Nature, vol. 616, no. 7956, pp. 259–265, Apr. 2023.
- T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient Finetuning of Quantized LLMs,” arXiv preprint arXiv:2305.14314, May. 2023.
- Y. Shen, J. Shao, X. Zhang, Z. Lin, H. Pan, D. Li, J. Zhang, and K. B. Letaief, “Large Language Models Empowered Autonomous Edge AI for Connected Intelligence,” arXiv preprint arXiv:2307.02779, 2023.
- L. Bariah, Q. Zhao, H. Zou, Y. Tian, F. Bader, and M. Debbah, “Large Language Models for Telecom: The Next Big Thing?” arXiv preprint arXiv:2306.10249, 2023.
- Nvidia, “Fastertransformer,” 2023. [Online]. Available: https://github.com/NVIDIA/FasterTransformer
- E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-Rank Adaptation of Large Language Models,” arXiv preprint arXiv:2106.09685, Jun. 2021.
- D. Alistarh, D. Grubic, J. Li, R. Tomioka, and M. Vojnovic, “QSGD: Communication-efficient SGD via Gradient Quantization and Encoding,” Proc. NIPS, Dec. 2017.
- H. Xi, C. Li, J. Chen, and J. Zhu, “Training Transformers with 4-bit Integers,” arXiv preprint arXiv:2306.11987, Jun. 2023.
- Z. Lin, G. Qu, X. Chen, and K. Huang, “Split Learning in 6G Edge Networks,” arXiv preprint arXiv:2306.12194, Jan. 2024.
- R. Yi, L. Guo, S. Wei, A. Zhou, S. Wang, and M. Xu, “EdgeMoE: Fast On-Device Inference of MoE-based Large Language Models,” arXiv preprint arXiv:2308.14352, 2023.
- A. Padmanabhan, N. Agarwal, A. Iyer, G. Ananthanarayanan, Y. Shu, N. Karianakis, G. H. Xu, and R. Netravali, “GEMEL: Model Merging for Memory-Efficient, Real-Time Video Analytics at the Edge,” in Proc. NSDI, Apr. 2023.
- Y. Leviathan, M. Kalman, and Y. Matias, “Fast Inference From Transformers via Speculative Decoding,” in Proc. ICML, 2023.