QuantTune: Optimizing Model Quantization with Adaptive Outlier-Driven Fine Tuning (2403.06497v1)
Abstract: Transformer-based models have gained widespread popularity in both the computer vision (CV) and NLP fields. However, significant challenges arise during post-training linear quantization, leading to noticeable reductions in inference accuracy. Our study focuses on uncovering the underlying causes of these accuracy drops and proposing a quantization-friendly fine-tuning method, \textbf{QuantTune}. Firstly, our analysis revealed that, on average, 65\% of quantization errors result from the precision loss incurred by the dynamic range amplification effect of outliers across the target Transformer-based models. Secondly, \textbf{QuantTune} adjusts weights based on the deviation of outlier activations and effectively constrains the dynamic ranges of the problematic activations. As a result, it successfully mitigates the negative impact of outliers on the inference accuracy of quantized models. Lastly, \textbf{QuantTune} can be seamlessly integrated into the back-propagation pass in the fine-tuning process without requiring extra complexity in inference software and hardware design. Our approach showcases significant improvements in post-training quantization across a range of Transformer-based models, including ViT, Bert-base, and OPT. QuantTune reduces accuracy drops by 12.09\% at 8-bit quantization and 33.8\% at 7-bit compared to top calibration methods, outperforming state-of-the-art solutions by over 18.84\% across ViT models.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.