Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 60 tok/s
Gemini 2.5 Pro 51 tok/s Pro
GPT-5 Medium 18 tok/s Pro
GPT-5 High 14 tok/s Pro
GPT-4o 77 tok/s Pro
Kimi K2 159 tok/s Pro
GPT OSS 120B 456 tok/s Pro
Claude Sonnet 4 38 tok/s Pro
2000 character limit reached

ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques (2103.11367v1)

Published 21 Mar 2021 in cs.CL

Abstract: Pre-trained LLMs of the BERT family have defined the state-of-the-arts in a wide range of NLP tasks. However, the performance of BERT-based models is mainly driven by the enormous amount of parameters, which hinders their application to resource-limited scenarios. Faced with this problem, recent studies have been attempting to compress BERT into a small-scale model. However, most previous work primarily focuses on a single kind of compression technique, and few attention has been paid to the combination of different methods. When BERT is compressed with integrated techniques, a critical question is how to design the entire compression framework to obtain the optimal performance. In response to this question, we integrate three kinds of compression methods (weight pruning, low-rank factorization and knowledge distillation (KD)) and explore a range of designs concerning model architecture, KD strategy, pruning frequency and learning rate schedule. We find that a careful choice of the designs is crucial to the performance of the compressed model. Based on the empirical findings, our best compressed model, dubbed Refined BERT cOmpreSsion with InTegrAted techniques (ROSITA), is $7.5 \times$ smaller than BERT while maintains $98.5\%$ of the performance on five tasks of the GLUE benchmark, outperforming the previous BERT compression methods with similar parameter budget. The code is available at https://github.com/llyx97/Rosita.

Citations (18)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.