Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Improving Ensemble Distillation With Weight Averaging and Diversifying Perturbation (2206.15047v1)

Published 30 Jun 2022 in cs.LG

Abstract: Ensembles of deep neural networks have demonstrated superior performance, but their heavy computational cost hinders applying them for resource-limited environments. It motivates distilling knowledge from the ensemble teacher into a smaller student network, and there are two important design choices for this ensemble distillation: 1) how to construct the student network, and 2) what data should be shown during training. In this paper, we propose a weight averaging technique where a student with multiple subnetworks is trained to absorb the functional diversity of ensemble teachers, but then those subnetworks are properly averaged for inference, giving a single student network with no additional inference cost. We also propose a perturbation strategy that seeks inputs from which the diversities of teachers can be better transferred to the student. Combining these two, our method significantly improves upon previous methods on various image classification tasks.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Giung Nam (10 papers)
  2. Hyungi Lee (15 papers)
  3. Byeongho Heo (33 papers)
  4. Juho Lee (106 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.