2000 character limit reached
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models (2403.13372v4)
Published 20 Mar 2024 in cs.CL and cs.AI
Abstract: Efficient fine-tuning is vital for adapting LLMs to downstream tasks. However, it requires non-trivial efforts to implement these methods on different models. We present LlamaFactory, a unified framework that integrates a suite of cutting-edge efficient training methods. It provides a solution for flexibly customizing the fine-tuning of 100+ LLMs without the need for coding through the built-in web UI LlamaBoard. We empirically validate the efficiency and effectiveness of our framework on LLMing and text generation tasks. It has been released at https://github.com/hiyouga/LLaMA-Factory and received over 25,000 stars and 3,000 forks.
- Gradio: Hassle-free sharing and testing of ml models in the wild. arXiv preprint arXiv:1906.02569.
- Lightning AI. 2023. Lit-GPT.
- The falcon series of open language models. arXiv preprint arXiv:2311.16867.
- GPT4All: Training an assistant-style chatbot with large scale data distillation from GPT-3.5-turbo.
- Apache. 2016. Arrow.
- Qwen technical report. arXiv preprint arXiv:2309.16609.
- Open LLM leaderboard.
- BitFit: Simple parameter-efficient fine-tuning for transformer-based masked language-models. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–9, Dublin, Ireland. Association for Computational Linguistics.
- Language models are homer simpson! safety re-alignment of fine-tuned language models through task arithmetic. arXiv preprint arXiv:2402.11746.
- DeepSeek LLM: Scaling open-source language models with longtermism. arXiv preprint arXiv:2401.02954.
- Kathi Canese and Sarah Weis. 2013. PubMed: the bibliographic database. The NCBI handbook, 2(1).
- Orion-14b: Open-source multilingual large language models. arXiv preprint arXiv:2401.12246.
- Extending context window of large language models via positional interpolation. arXiv preprint arXiv:2306.15595.
- Training deep nets with sublinear memory cost. arXiv preprint arXiv:1604.06174.
- LongLoRA: Efficient fine-tuning of long-context large language models. In International Conference on Learning Representations.
- Efficient and effective text encoding for chinese llama and alpaca. arXiv preprint arXiv:2304.08177.
- DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. arXiv preprint arXiv:2401.06066.
- Flashattention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Processing Systems, 35:16344–16359.
- Tim Dettmers. 2021. Bitsandbytes.
- GPT3.int8(): 8-bit matrix multiplication for transformers at scale. Advances in Neural Information Processing Systems, 35:30318–30332.
- 8-bit optimizers via block-wise quantization. In International Conference on Learning Representations.
- QLoRA: Efficient finetuning of quantized llms. Advances in Neural Information Processing Systems, 36:10088–10115.
- LMFlow: An extensible toolkit for finetuning and inference of large foundation models. arXiv preprint arXiv:2306.12420.
- GLM: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, Dublin, Ireland. Association for Computational Linguistics.
- Extreme compression of large language models via additive quantization. arXiv preprint arXiv:2401.06118.
- GPTQ: Accurate post-training quantization for generative pre-trained transformers. In International Conference on Learning Representations.
- OLMo: Accelerating the science of language models. arXiv preprint arXiv:2402.00838.
- DeepSeek-Coder: When the large language model meets programming – the rise of code intelligence. arXiv preprint arXiv:2401.14196.
- Daniel Han and Michael Han. 2023. unsloth.
- LoRA+: Efficient low rank adaptation of large models. arXiv preprint arXiv:2402.12354.
- Measuring massive multitask language understanding. In International Conference on Learning Representations.
- Parameter-efficient transfer learning for nlp. In International conference on machine learning, pages 2790–2799. PMLR.
- LoRA: Low-rank adaptation of large language models. In International Conference on Learning Representations.
- C-Eval: A multi-level multi-discipline chinese evaluation suite for foundation models. Advances in Neural Information Processing Systems, 36.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- ReasoningLM: Enabling structural subgraph reasoning in pre-trained language models for question answering over knowledge graph. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3721–3735, Singapore. Association for Computational Linguistics.
- ParroT: Translating during chat using large language models tuned with human translation and feedback. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 15009–15020, Singapore. Association for Computational Linguistics.
- Instruct and extract: Instruction tuning for on-demand information extraction. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 10030–10051, Singapore. Association for Computational Linguistics.
- Damjan Kalajdzievski. 2023. A rank stabilization scaling factor for fine-tuning with LoRA. arXiv preprint arXiv:2312.03732.
- SOLAR 10.7B: Scaling large language models with simple yet effective depth up-scaling. arXiv preprint arXiv:2312.15166.
- Efficient sequence packing without cross-contamination: Accelerating large language models without impacting performance. arXiv preprint arXiv:2107.02027.
- Efficient memory management for large language model serving with PagedAttention. In Proceedings of the 29th Symposium on Operating Systems Principles, pages 611–626.
- BLOOM: A 176b-parameter open-access multilingual language model. arXiv preprint arXiv:2211.05100.
- Datasets: A community library for natural language processing. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 175–184.
- CMMLU: Measuring massive multitask language understanding in chinese. arXiv preprint arXiv:2306.09212.
- Colossal-AI: A unified deep learning system for large-scale parallel training. In Proceedings of the 52nd International Conference on Parallel Processing, pages 766–775.
- Textbooks are all you need ii: phi-1.5 technical report. arXiv preprint arXiv:2309.05463.
- Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
- AWQ: Activation-aware weight quantization for llm compression and acceleration. arXiv preprint arXiv:2306.00978.
- DoRA: Weight-decomposed low-rank adaptation. arXiv preprint arXiv:2402.09353.
- StarCoder 2 and The Stack v2: The next generation. arXiv preprint arXiv:2402.19173.
- PEFT: State-of-the-art parameter-efficient fine-tuning methods.
- Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295.
- Mixed precision training. In International Conference on Learning Representations.
- Abstractive text summarization using sequence-to-sequence rnns and beyond. In Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning, pages 280–290.
- Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1797–1807, Brussels, Belgium. Association for Computational Linguistics.
- Crossing linguistic horizons: Finetuning and comprehensive evaluation of vietnamese large language models.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
- BLEU: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318.
- PyTorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32.
- Improving language understanding by generative pre-training. OpenAI blog.
- Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems, 37.
- DeepSpeed: System optimizations enable training deep learning models with over 100 billion parameters. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 3505–3506.
- Long and diverse text generation with planning-based hierarchical variational model. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3257–3268, Hong Kong, China. Association for Computational Linguistics.
- DeepSeekMath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300.
- Stanford alpaca: An instruction-following llama model.
- InternLM Team. 2023. InternLM: A multilingual language model with progressively enhanced capabilities.
- Triton: an intermediate language and compiler for tiled neural network computations. In Proceedings of the 3rd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, pages 10–19.
- LLaMA: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Zephyr: Direct distillation of LM alignment. arXiv preprint arXiv:2310.16944.
- TRL: Transformer reinforcement learning.
- Esrl: Efficient sampling-based reinforcement learning for sequence generation. arXiv preprint arXiv:2308.02223.
- OpenChat: Advancing open-source language models with mixed-quality data. arXiv preprint arXiv:2309.11235.
- Document-level machine translation with large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 16646–16661, Singapore. Association for Computational Linguistics.
- How far can camels go? exploring the state of instruction tuning on open resources. Advances in Neural Information Processing Systems, 36.
- Finetuned language models are zero-shot learners. In International Conference on Learning Representations.
- Skywork: A more open bilingual foundation model. arXiv preprint arXiv:2310.19341.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
- YUAN 2.0: A large language model with localized filtering-based attention. arXiv preprint arXiv:2311.15786.
- Baichuan 2: Open large-scale language models. arXiv preprint arXiv:2309.10305.
- ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations.
- Yi: Open foundation models by 01.ai. arXiv preprint arXiv:2403.04652.
- Open, closed, or small language models for text classification? arXiv preprint arXiv:2308.10092.
- LLaMA-Adapter: Efficient fine-tuning of language models with zero-init attention. In International Conference on Learning Representations.
- GaLore: Memory-efficient llm training by gradient low-rank projection. arXiv preprint arXiv:2403.03507.
- A survey of large language models. arXiv preprint arXiv:2303.18223.
- Judging LLM-as-a-judge with mt-bench and chatbot arena. Advances in Neural Information Processing Systems, 36.