Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models (2402.13064v1)
Abstract: We introduce Generalized Instruction Tuning (called GLAN), a general and scalable method for instruction tuning of LLMs. Unlike prior work that relies on seed examples or existing datasets to construct instruction tuning data, GLAN exclusively utilizes a pre-curated taxonomy of human knowledge and capabilities as input and generates large-scale synthetic instruction data across all disciplines. Specifically, inspired by the systematic structure in human education system, we build the taxonomy by decomposing human knowledge and capabilities to various fields, sub-fields and ultimately, distinct disciplines semi-automatically, facilitated by LLMs. Subsequently, we generate a comprehensive list of subjects for every discipline and proceed to design a syllabus tailored to each subject, again utilizing LLMs. With the fine-grained key concepts detailed in every class session of the syllabus, we are able to generate diverse instructions with a broad coverage across the entire spectrum of human knowledge and skills. Extensive experiments on LLMs (e.g., Mistral) demonstrate that GLAN excels in multiple dimensions from mathematical reasoning, coding, academic exams, logical reasoning to general instruction following without using task-specific training data of these tasks. In addition, GLAN allows for easy customization and new fields or skills can be added by simply incorporating a new node into our taxonomy.
- Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018.
- Sahil Chaudhary. Code alpaca: An instruction-following llama model for code generation. https://github.com/sahil280114/codealpaca, 2023.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023.
- Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality, March 2023.
- Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
- Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233, 2023.
- The false promise of imitating proprietary llms. arXiv preprint arXiv:2305.15717, 2023.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- The curious case of neural text degeneration. arXiv preprint arXiv:1904.09751, 2019.
- Measuring mathematical problem solving with the math dataset. NeurIPS, 2021.
- Training compute-optimal large language models. arXiv preprint arXiv:2203.15556, 2022.
- Unnatural instructions: Tuning language models with (almost) no human labor. ArXiv, abs/2212.09689, 2022.
- Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
- Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
- Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463, 2023.
- The flan collection: Designing data and methods for effective instruction tuning. arXiv preprint arXiv:2301.13688, 2023.
- Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583, 2023.
- Wizardcoder: Empowering code large language models with evol-instruct. arXiv preprint arXiv:2306.08568, 2023.
- Orca 2: Teaching small language models how to reason. arXiv preprint arXiv:2311.11045, 2023.
- Orca: Progressive learning from complex explanation traces of gpt-4. arXiv preprint arXiv:2306.02707, 2023.
- Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Large language model alignment: A survey. arXiv preprint arXiv:2309.15025, 2023.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models, 2023.
- Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022.
- Multitask prompted training enables zero-shot task generalization. In The Tenth International Conference on Learning Representations, 2022.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Education, 2023. Last edited on 24 March 2023.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560, 2022.
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks. arXiv preprint arXiv:2204.07705, 2022.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
- Skywork: A more open bilingual foundation model, 2023.
- Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.01196, 2023.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244, 2023.
- Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023.
- Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911, 2023.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.