Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 37 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 10 tok/s Pro
GPT-5 High 15 tok/s Pro
GPT-4o 84 tok/s Pro
Kimi K2 198 tok/s Pro
GPT OSS 120B 448 tok/s Pro
Claude Sonnet 4 31 tok/s Pro
2000 character limit reached

AST: Effective Dataset Distillation through Alignment with Smooth and High-Quality Expert Trajectories (2310.10541v2)

Published 16 Oct 2023 in cs.CV, cs.AI, and cs.LG

Abstract: Training large AI models typically requires large-scale datasets in the machine learning process, making training and parameter-tuning process both time-consuming and costly. Some researchers address this problem by carefully synthesizing a very small number of highly representative and informative samples from real-world datasets. This approach, known as Dataset Distillation (DD), proposes a perspective for data-efficient learning. Despite recent progress in this field, the performance of existing methods still cannot meet expectations, and distilled datasets cannot effectively replace original datasets. In this paper, unlike previous methods that focus solely on improving the effectiveness of student distillation, we recognize and leverage the important mutual influence between expert and student models. We observed that the smoothness of expert trajectories has a significant impact on subsequent student parameter alignment. Based on this, we propose an effective DD framework named AST, standing for Alignment with Smooth and high-quality expert Trajectories. We devise the integration of clipping loss and gradient penalty to regulate the rate of parameter changes in expert trajectory generation. To further refine the student parameter alignment with expert trajectory, we put forward representative initialization for the synthetic dataset and balanced inner-loop loss in response to the sensitivity exhibited towards randomly initialized variables during distillation. We also propose two enhancement strategies, namely intermediate matching loss and weight perturbation, to mitigate the potential occurrence of cumulative errors. We conduct extensive experiments on datasets of different scales, sizes, and resolutions. The results demonstrate that the proposed method significantly outperforms prior methods.

Citations (1)

Summary

We haven't generated a summary for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Lightbulb On Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.