Emergent Mind

RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment

(2304.06767)
Published Apr 13, 2023 in cs.LG , cs.AI , cs.CL , cs.CV , and stat.ML

Abstract

Generative foundation models are susceptible to implicit biases that can arise from extensive unsupervised training data. Such biases can produce suboptimal samples, skewed outcomes, and unfairness, with potentially serious consequences. Consequently, aligning these models with human ethics and preferences is an essential step toward ensuring their responsible and effective deployment in real-world applications. Prior research has primarily employed Reinforcement Learning from Human Feedback (RLHF) to address this problem, where generative models are fine-tuned with RL algorithms guided by a human-feedback-informed reward model. However, the inefficiencies and instabilities associated with RL algorithms frequently present substantial obstacles to the successful alignment, necessitating the development of a more robust and streamlined approach. To this end, we introduce a new framework, Reward rAnked FineTuning (RAFT), designed to align generative models effectively. Utilizing a reward model and a sufficient number of samples, our approach selects the high-quality samples, discarding those that exhibit undesired behavior, and subsequently enhancing the model by fine-tuning on these filtered samples. Our studies show that RAFT can effectively improve the model performance in both reward learning and other automated metrics in both LLMs and diffusion models.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a detailed summary of this paper with a premium account.

We ran into a problem analyzing this paper.

Subscribe by Email

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.

References
  1. A General Language Assistant as a Laboratory for Alignment
  2. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
  3. Constitutional AI: Harmlessness from AI Feedback
  4. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency, pp.  610–623
  5. Training Diffusion Models with Reinforcement Learning
  6. On the Opportunities and Risks of Foundation Models
  7. Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika, 39(3/4):324–345
  8. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901
  9. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
  10. On the Weaknesses of Reinforcement Learning for Neural Machine Translation
  11. PaLM: Scaling Language Modeling with Pathways
  12. Supervising strong learners by amplifying weak experts
  13. Training Verifiers to Solve Math Word Problems
  14. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794
  15. Lmflow: An extensible toolkit for finetuning and inference of large foundation models. https://optimalscale.github.io/LMFlow/

  16. Implementation Matters in Deep Policy Gradients: A Case Study on PPO and TRPO
  17. Hierarchical Neural Story Generation
  18. The Pile: An 800GB Dataset of Diverse Text for Language Modeling
  19. Scaling laws for reward model overoptimization. In International Conference on Machine Learning, pp. 10835–10866. PMLR
  20. Openllama: An open reproduction of llama, May 2023. https://github.com/openlm-research/open_llama.

  21. Improving alignment of dialogue agents via targeted human judgements
  22. Optimizing Prompts for Text-to-Image Generation
  23. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851
  24. Training Compute-Optimal Large Language Models
  25. The Curious Case of Neural Text Degeneration
  26. LoRA: Low-Rank Adaptation of Large Language Models
  27. Openclip, July 2021. https://doi.org/10.5281/zenodo.5143773. If you use this software, please cite it as below.

  28. AI safety via debate
  29. Is pessimism provably efficient for offline rl? In International Conference on Machine Learning, pp. 5084–5096. PMLR
  30. Wendell Johnson. Studies in language behavior: A program of research. Psychological Monographs, 56(2):1–15
  31. Aligning Text-to-Image Models using Human Feedback
  32. Scalable agent alignment via reward modeling: a research direction
  33. Fast inference from transformers via speculative decoding. In International Conference on Machine Learning, pp. 19274–19286. PMLR
  34. A Diversity-Promoting Objective Function for Neural Conversation Models
  35. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35
  36. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp.  142–150, Portland, Oregon, USA, June 2011. Association for Computational Linguistics. http://www.aclweb.org/anthology/P11-1015.

  37. Understanding Learned Reward Functions
  38. WebGPT: Browser-assisted question-answering with human feedback
  39. GPT-4 Technical Report
  40. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744
  41. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9
  42. Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR
  43. Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
  44. Hierarchical Text-Conditional Image Generation with CLIP Latents
  45. Raj Reddy. Speech understanding systems: A summary of results of the five-year research effort at carnegie mellon university. Pittsburgh, Pa
  46. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695
  47. BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
  48. Training Language Models with Language Feedback at Scale
  49. Proximal Policy Optimization Algorithms
  50. Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model
  51. Denoising Diffusion Implicit Models
  52. Score-Based Generative Modeling through Stochastic Differential Equations
  53. Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021
  54. A Contrastive Framework for Neural Text Generation
  55. Causal Confusion and Reward Misidentification in Preference-Based Reward Learning
  56. LLaMA: Open and Efficient Foundation Language Models
  57. Martin J Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press
  58. Self-Instruct: Aligning Language Models with Self-Generated Instructions
  59. Finetuned Language Models Are Zero-Shot Learners
  60. Emergent abilities of LLMs. Transactions on Machine Learning Research, 2022a. https://openreview.net/forum?id=yzkSU5zdwD. Survey Certification.

  61. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
  62. Recursively Summarizing Books with Human Feedback
  63. Human Preference Score: Better Aligning Text-to-Image Models with Human Preference
  64. Policy finetuning: Bridging sample-efficient offline and online reinforcement learning. Advances in neural information processing systems, 34
  65. Nearly Minimax Optimal Offline Reinforcement Learning with Linear Function Approximation: Single-Agent MDP and Markov Game
  66. RRHF: Rank Responses to Align Language Models with Human Feedback without tears
  67. Fine-Tuning Language Models from Human Preferences

Show All 67