SLiC-HF: Sequence Likelihood Calibration with Human Feedback

Published 17 May 2023 in cs.CL and cs.AI | (2305.10425v1)

Abstract: Learning from human feedback has been shown to be effective at aligning LLMs with human preferences. Past work has often relied on Reinforcement Learning from Human Feedback (RLHF), which optimizes the LLM using reward scores assigned from a reward model trained on human preference data. In this work we show how the recently introduced Sequence Likelihood Calibration (SLiC), can also be used to effectively learn from human preferences (SLiC-HF). Furthermore, we demonstrate this can be done with human feedback data collected for a different model, similar to off-policy, offline RL data. Automatic and human evaluation experiments on the TL;DR summarization task show that SLiC-HF significantly improves supervised fine-tuning baselines. Furthermore, SLiC-HF presents a competitive alternative to the PPO RLHF implementation used in past work while being much simpler to implement, easier to tune and more computationally efficient in practice.

Abstract PDF Upgrade to Chat

Authors (6)

Citations (217)

View on Semantic Scholar

Summary

The paper introduces SLiC-HF, a method that calibrates sequence likelihoods using pairwise human feedback to improve model alignment without complex RL techniques.
It leverages off-policy human feedback data to rank model outputs, reducing computational demands and simplifying hyperparameter tuning compared to RLHF-PPO.
Experimental results demonstrate SLiC-HF's scalability and performance, matching or outperforming traditional RLHF approaches on benchmarks such as the Reddit TL;DR dataset.

Overview of "SLiC-HF: Sequence Likelihood Calibration with Human Feedback"

The paper "SLiC-HF: Sequence Likelihood Calibration with Human Feedback" presents a novel approach to aligning LLMs with human preferences through Sequence Likelihood Calibration with Human Feedback (SLiC-HF). Traditionally, alignment of LLM outputs with human judgments has relied on Reinforcement Learning from Human Feedback (RLHF), specifically leveraging algorithms like PPO to optimize model behavior based on human-generated reward signals. However, SLiC-HF proposes an alternative method that uses Sequence Likelihood Calibration (SLiC) to adjust model outputs by ranking decoded sequences based on human feedback data without the complexities associated with reinforcement learning.

Core Contributions

The main contributions of this work are multifaceted:

Introduction of SLiC-HF: It applies SLiC techniques to leverage human preferences, providing a simpler, more efficient alternative to RLHF. This involves calibrating the sequence likelihoods of a Supervised Fine-Tuned (SFT) model by using pairwise human preference data to rank model outputs.
Utilization of Off-Policy Data: SLiC-HF can effectively use human feedback data collected for different models, akin to off-policy, offline RL data. This characteristic negates the need for bespoke feedback data, yielding potential cost and workflow efficiencies.
Recipe for Implementation: The authors provide detailed guidance on implementing SLiC-HF using open-source tools, demonstrating its viability through extensive experimentation on the Reddit TL;DR dataset.

Experimental Insights

The authors conducted experiments using both automatic evaluations and human evaluations to demonstrate SLiC-HF’s efficacy:

Quantitative Performance: SLiC-HF models showed substantial improvements over baseline SFT models. Furthermore, when compared to the RLHF-PPO models from prior work, SLiC-HF showed comparable or superior performance, indicating it as a viable alternative.
Efficiency Gains: By removing the necessity of maintaining large auxiliary models (reward/value networks), SLiC-HF reduces the computational complexity and memory usage associated with training, enabling easier hyperparameter tuning and potentially faster convergence times.
Scalability: Scaling experiments with model sizes and different calibration configurations suggest SLiC-HF’s effectiveness across model sizes and show robustness in performance when increasing the number of candidate sequences sampled for ranking.

Theoretical Implications and Future Directions

SLiC-HF highlights the effectiveness of ranking-based calibration over reinforcement learning approaches when aligning model outputs with human preferences. This is particularly salient given the potential noise in translating pairwise human judgments to pointwise rewards necessary for traditional RL methods.

From a theoretical perspective, SLiC-HF sheds light on the potential of integrating pairwise feedback directly into supervised calibration frameworks, bypassing traditional RL complexities. Practically, this methodology could be extended across diverse tasks in natural language processing where human preference alignment is critical, such as dialogue systems, creative content generation, and more.

Future research could explore SLiC-HF's adaptability to tasks beyond summarization and investigate its integration with non-human feedback, such as synthetic data or machine-generated judgments, to understand the broader applicability of pairwise feedback-driven calibration.

Markdown Report Issue