A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors (2406.10203v4)
Abstract: The relationship between the quality of a string, as judged by a human reader, and its probability, $p(\boldsymbol{y})$ under a LLM undergirds the development of better LLMs. For example, many popular algorithms for sampling from a LLM have been conceived with the goal of manipulating $p(\boldsymbol{y})$ to place higher probability on strings that humans deem of high quality. In this article, we examine the probability--quality relationship in LLMs explicitly aligned to human preferences, e.g., through reinforcement learning through human feedback. We show that, when sampling corpora from an aligned LLM, there exists a trade-off between the strings' average reward and average log-likelihood under the prior LLM, i.e., the same model before alignment with human preferences. We provide a formal treatment of this phenomenon and demonstrate how a choice of sampling adaptor allows for a selection of how much likelihood we exchange for the reward.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.