Chat Vector: A Simple Approach to Equip LLMs with Instruction Following and Model Alignment in New Languages (2310.04799v3)
Abstract: Recently, the development of open-source LLMs has advanced rapidly. Nevertheless, due to data constraints, the capabilities of most open-source LLMs are primarily focused on English. To address this issue, we introduce the concept of $\textit{chat vector}$ to equip pre-trained LLMs with instruction following and human value alignment via simple model arithmetic. The chat vector is derived by subtracting the weights of a pre-trained base model (e.g. LLaMA2) from those of its corresponding chat model (e.g. LLaMA2-chat). By simply adding the chat vector to a continual pre-trained model's weights, we can endow the model with chat capabilities in new languages without the need for further training. Our empirical studies demonstrate the superior efficacy of the chat vector from three different aspects: instruction following, toxicity mitigation, and multi-turn dialogue. Moreover, to showcase the adaptability of our approach, we extend our experiments to encompass various languages, base models, and chat vectors. The results underscore the chat vector's simplicity, effectiveness, and wide applicability, making it a compelling solution for efficiently enabling conversational capabilities in pre-trained LLMs. Our code is available at https://github.com/aqweteddy/ChatVector.
- A general theoretical paradigm to understand learning from human preferences.
- Language models are few-shot learners. arXiv preprint arXiv: 2005.14165.
- Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30.
- Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
- Elastic weight removal for faithful and abstractive dialogue generation. arXiv preprint arXiv: 2303.17574.
- Kto: Model alignment as prospect theoretic optimization.
- Scaling laws for reward model overoptimization. International Conference on Machine Learning.
- RealToxicityPrompts: Evaluating neural toxic degeneration in language models. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3356–3369, Online. Association for Computational Linguistics.
- Lora: Low-rank adaptation of large language models.
- Reward learning from human preferences and demonstrations in atari. Advances in neural information processing systems, 31.
- Editing models with task arithmetic. In The Eleventh International Conference on Learning Representations.
- Camels in a changing climate: Enhancing lm adaptation with tulu 2.
- Mistral 7b. arXiv preprint arXiv: 2310.06825.
- CTRL - A Conditional Transformer Language Model for Controllable Generation. arXiv preprint arXiv:1909.05858.
- L. Junbum. 2023. llama-2-ko-7b (revision 4a9993e).
- Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Neural Information Processing Systems.
- Michael S Matena and Colin A Raffel. 2022. Merging models with fisher-weighted averaging. Advances in Neural Information Processing Systems, 35:17703–17716.
- Webgpt: Browser-assisted question-answering with human feedback. arXiv preprint arXiv: 2112.09332.
- OpenAI. 2023. Gpt-4 technical report.
- Task arithmetic in the tangent space: Improved editing of pre-trained models. arXiv preprint arXiv: 2305.12827.
- Training language models to follow instructions with human feedback.
- Language models are unsupervised multitask learners.
- Direct preference optimization: Your language model is secretly a reward model. NEURIPS.
- Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards. arXiv preprint arXiv: 2306.04488.
- Elyza-japanese-llama-2-7b.
- Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347.
- Safety assessment of chinese large language models. arXiv preprint arXiv:2304.10436.
- Xwin-LM Team. 2023. Xwin-lm.
- Llama: Open and efficient foundation language models.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv: 2307.09288.
- Self-instruct: Aligning language model with self generated instructions.
- Bloom: A 176b-parameter open-access multilingual language model.
- Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. International Conference on Machine Learning.
- Robust fine-tuning of zero-shot models. Computer Vision and Pattern Recognition.
- Resolving interference when merging models. arXiv preprint arXiv: 2306.01708.
- YuLan-Team. 2023. Yulan-chat: An open-source bilingual chatbot. https://github.com/RUC-GSAI/YuLan-Chat.
- Composing parameter-efficient modules with arithmetic operations. arXiv preprint arXiv: 2306.14870.
- Fine-tuning language models from human preferences. arXiv preprint arXiv: 1909.08593.