Papers

Topics

Authors

Recent

View all

Assistant

AI Research Assistant

Well-researched responses based on relevant abstracts and paper content.

Custom Instructions Pro

Preferences or requirements that you'd like Emergent Mind to consider when generating responses.

Gemini 2.5 Flash

Gemini 2.5 Flash 167 tok/s

Gemini 2.5 Pro 42 tok/s Pro

GPT-5 Medium 32 tok/s Pro

GPT-5 High 33 tok/s Pro

GPT-4o 108 tok/s Pro

Kimi K2 207 tok/s Pro

GPT OSS 120B 471 tok/s Pro

Claude Sonnet 4.5 36 tok/s Pro

2000 character limit reached

Multi-armed bandits for resource efficient, online optimization of language model pre-training: the use case of dynamic masking (2203.13151v2)

Published 24 Mar 2022 in cs.CL, cs.LG, and stat.ML

Abstract: We design and evaluate a Bayesian optimization framework for resource efficient pre-training of Transformer-based LLMs (TLMs). TLM pre-training requires high computational resources and introduces many unresolved design choices, such as selecting its pre-training hyperparameters. We propose a multi-armed bandit framework for the sequential selection of TLM pre-training hyperparameters, aimed at optimizing LLM performance, in a resource efficient manner. We design a Thompson sampling algorithm, with a surrogate Gaussian process reward model of the Masked LLM (MLM) pre-training objective, for its sequential minimization. Instead of MLM pre-training with fixed masking probabilities, the proposed Gaussian process-based Thompson sampling (GP-TS) accelerates pre-training by sequentially selecting masking hyperparameters that improve performance. We empirically demonstrate how GP-TS pre-trains LLMs efficiently, i.e., it achieves lower MLM loss in fewer epochs, across a variety of settings. In addition, GP-TS pre-trained TLMs attain competitive downstream performance, while avoiding expensive hyperparameter grid search. GP-TS provides an interactive framework for efficient and optimized TLM pre-training that, by circumventing costly hyperparameter selection, enables substantial computational savings.

Citations (1)

View on Semantic Scholar