MoDS: Model-oriented Data Selection for Instruction Tuning (2311.15653v1)
Abstract: Instruction tuning has become the de facto method to equip LLMs with the ability of following user instructions. Usually, hundreds of thousands or millions of instruction-following pairs are employed to fine-tune the foundation LLMs. Recently, some studies show that a small number of high-quality instruction data is enough. However, how to select appropriate instruction data for a given LLM is still an open problem. To address this problem, in this paper we present a model-oriented data selection (MoDS) approach, which selects instruction data based on a new criteria considering three aspects: quality, coverage and necessity. First, our approach utilizes a quality evaluation model to filter out the high-quality subset from the original instruction dataset, and then designs an algorithm to further select from the high-quality subset a seed instruction dataset with good coverage. The seed dataset is applied to fine-tune the foundation LLM to obtain an initial instruction-following LLM. Finally, we develop a necessity evaluation model to find out the instruction data which are performed badly in the initial instruction-following LLM and consider them necessary instructions to further improve the LLMs. In this way, we can get a small high-quality, broad-coverage and high-necessity subset from the original instruction datasets. Experimental results show that, the model fine-tuned with 4,000 instruction pairs selected by our approach could perform better than the model fine-tuned with the full original dataset which includes 214k instruction data.
- Training a helpful and harmless assistant with reinforcement learning from human feedback.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
- Instruction mining: High-quality instruction data selection for large language models.
- Alpagasus: Training a better alpaca with fewer data.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Palm: Scaling language modeling with pathways. arXiv preprint arXiv:2204.02311.
- Free dolly: Introducing the world’s first truly open instruction-tuned llm.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
- Enhancing chat language models by scaling high-quality instructional conversations. arXiv preprint arXiv:2305.14233.
- How close is chatgpt to human experts? comparison corpus, evaluation, and detection. arXiv preprint arxiv:2301.07597.
- Deberta: Decoding-enhanced bert with disentangled attention.
- Unnatural instructions: Tuning language models with (almost) no human labor. arXiv preprint arXiv:2212.09689.
- Opt-iml: Scaling language model instruction meta learning through the lens of generalization.
- Openassistant conversations–democratizing large language model alignment. arXiv preprint arXiv:2304.07327.
- From quantity to quality: Boosting llm performance with self-guided data selection for instruction tuning. arXiv preprint arXiv:2308.12032.
- Self-alignment with instruction backtranslation. arXiv preprint arXiv:2308.06259.
- The flan collection: Designing data and methods for effective instruction tuning.
- Webgpt: Browser-assisted question-answering with human feedback.
- Instruction in the wild: A user-based instruction dataset. https://github.com/XueFuzhao/InstructionWild.
- OpenAI. 2023. GPT-4 technical report. CoRR, abs/2303.08774.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- Ozan Sener and Silvio Savarese. 2017. Active learning for convolutional neural networks: A core-set approach. arXiv preprint arXiv:1708.00489.
- On the exploitability of instruction tuning. arXiv preprint arXiv:2306.17194.
- Learning to summarize from human feedback.
- Principle-driven self-alignment of language models from scratch with minimal human supervision. arXiv preprint arXiv:2305.03047.
- Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca.
- Llama: Open and efficient foundation language models. CoRR, abs/2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Koala: An index for quantifying overlaps with pre-training corpora. arXiv preprint arXiv:2303.14770.
- Openchat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235.
- How far can camels go? exploring the state of instruction tuning on open resources.
- Self-instruct: Aligning language model with self generated instructions. arXiv preprint arXiv:2212.10560.
- Super-naturalinstructions: Generalization via declarative instructions on 1600+ nlp tasks.
- Aligning large language models with human: A survey. arXiv preprint arXiv:2307.12966.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652.
- Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244.
- Large language model as attributed training data generator: A tale of diversity and bias. arXiv preprint arXiv:2306.15895.
- Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068.
- Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206.