On-Policy Distillation of Language Models: Learning from Self-Generated Mistakes (2306.13649v3)

Published 23 Jun 2023 in cs.LG, cs.AI, and cs.CL

Abstract: Knowledge distillation (KD) is widely used for compressing a teacher model to reduce its inference cost and memory footprint, by training a smaller student model. However, current KD methods for auto-regressive sequence models suffer from distribution mismatch between output sequences seen during training and those generated by the student during inference. To address this issue, we introduce Generalized Knowledge Distillation (GKD). Instead of solely relying on a fixed set of output sequences, GKD trains the student on its self-generated output sequences by leveraging feedback from the teacher on such sequences. Unlike supervised KD approaches, GKD also offers the flexibility to employ alternative loss functions between the student and teacher, which can be useful when the student lacks the expressivity to mimic the teacher's distribution. Furthermore, GKD facilitates the seamless integration of distillation with RL fine-tuning (RLHF). We demonstrate the efficacy of GKD for distilling auto-regressive LLMs on summarization, translation, and arithmetic reasoning tasks, and task-agnostic distillation for instruction-tuning.

References (44)

Citations (46)

View on Semantic Scholar

Summary

The paper introduces an innovative on-policy distillation method where models learn from their own errors.
It leverages self-generated mistakes to refine prediction strategies and streamline the training process.
Results indicate that this approach significantly enhances language model performance in understanding and generation tasks.

Analysis of Formatting Instructions for ICLR 2024 Conference Submissions

The paper, titled "Formatting Instructions for ICLR 2024 Conference Submissions," serves as a comprehensive guideline for researchers preparing manuscripts for the International Conference on Learning Representations (ICLR) in 2024. The document meticulously details the structural and stylistic requirements that ensure uniformity and clarity across submissions.

Key Elements of the Paper

The paper outlines several critical elements that authors must adhere to:

Submission Process: It emphasizes the necessity of electronic submissions through the OpenReview platform. This digital submission process is designed to streamline the review workflow.
Formatting Specifications: Authors are required to follow a modified NeurIPS format. Specific guidelines on paper dimensions, typeface usage, and page limits (9-page main text) are provided. This ensures that submissions are consistent and easily navigable for reviewers.
Style File Usage: The document stresses the importance of utilizing the ICLR-specific LaTeX style files, which are accessible online. Deviating from these files may result in rejection, underscoring the conference's commitment to maintaining formatting standards.
Headings and Structure: A hierarchical structure is laid out for headings, comprising three levels each with distinct formatting specifications. This facilitates logical and coherent presentation of the content.
Figures and Tables: Detailed instructions are given on the inclusion and formatting of figures and tables, which must be neat, centered, and appropriately captioned.
References and Citations: Guidelines on citing works using the natbib package are given, allowing for standardization in referencing which aids credibility and scholarly communication.
Standardized Notations: The inclusion of standardized mathematical notations from the Deep Learning textbook promotes consistency in mathematical expressions, enhancing interpretability across works.

Implications and Future Directions

The instructions presented in the paper hold substantial significance for the academic community. By enforcing a standardized format, ICLR ensures that the focus remains on the scientific content rather than presentation discrepancies. This alignment facilitates efficient peer review and uniform accessibility for published papers.

Additionally, the emphasis on precise formatting could encourage other conferences to adopt similar approaches, fostering a broader culture of rigor in research presentations across the field.

In terms of future developments in AI and conference submissions, as conferences grow and adapt, we might anticipate even more automation in the submission and formatting process. Tools could be developed to automatically check compliance with formatting guidelines, further reducing the administrative burden on researchers.

The ongoing evolution of digital submission platforms could also see enhancements in collaboration tools, allowing for more seamless interactions between authors, reviewers, and conference organizers. As AI research continues to expand, standardizing these processes will remain crucial for maintaining quality and integrity in scholarly communications.

Overall, these formatting instructions not only guide researchers in structuring their submissions but also reflect ICLR's dedication to promoting high-quality, accessible research.

PDF Markdown

Related Papers

Tweets

https://twitter.com/_philschmid/status/1807012074951770130

https://twitter.com/agarwl_/status/1851713499535208785

https://twitter.com/_philschmid/status/1819035636537675782

https://twitter.com/amaarora/status/1809391484099260652

https://twitter.com/m_wulfmeier/status/1824067195153908191

https://twitter.com/tokenbender/status/1934590662185537959