Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Masking as an Efficient Alternative to Finetuning for Pretrained Language Models (2004.12406v2)

Published 26 Apr 2020 in cs.CL

Abstract: We present an efficient method of utilizing pretrained LLMs, where we learn selective binary masks for pretrained weights in lieu of modifying them through finetuning. Extensive evaluations of masking BERT and RoBERTa on a series of NLP tasks show that our masking scheme yields performance comparable to finetuning, yet has a much smaller memory footprint when several tasks need to be inferred simultaneously. Through intrinsic evaluations, we show that representations computed by masked LLMs encode information necessary for solving downstream tasks. Analyzing the loss landscape, we show that masking and finetuning produce models that reside in minima that can be connected by a line segment with nearly constant test accuracy. This confirms that masking can be utilized as an efficient alternative to finetuning.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Mengjie Zhao (35 papers)
  2. Tao Lin (167 papers)
  3. Fei Mi (56 papers)
  4. Martin Jaggi (155 papers)
  5. Hinrich Schütze (250 papers)
Citations (109)

Summary

We haven't generated a summary for this paper yet.