Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

Published 23 May 2024 in cs.LG and cs.CL | (2405.14769v1)

Abstract: Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human communication, we study how to extract fine-grained data regarding why an example is preferred that is useful for learning more accurate reward models. We propose to enrich binary preference queries to ask both (1) which features of a given example are preferable in addition to (2) comparisons between examples themselves. We derive an approach for learning from these feature-level preferences, both for cases where users specify which features are reward-relevant, and when users do not. We evaluate our approach on linear bandit settings in both vision- and language-based domains. Results support the efficiency of our approach in quickly converging to accurate rewards with fewer comparisons vs. example-only labels. Finally, we validate the real-world applicability with a behavioral experiment on a mushroom foraging task. Our findings suggest that incorporating pragmatic feature preferences is a promising approach for more efficient user-aligned reward learning.

Abstract PDF Upgrade to Chat

Authors (4)

Citations (1)

View on Semantic Scholar

Summary

The paper presents a novel method that augments binary queries with pragmatic feature feedback to derive more accurate reward functions.
The methodology combines example-level and feature-level queries, leading to faster convergence in both vision and language tasks.
The paper validates the approach with a user study, showing that enhanced feedback reduces learning effort without increasing user burden.

Pragmatic Feature Preferences: Learning Reward-Relevant Preferences from Human Input

The paper presents a novel approach to learning reward functions from human preferences by incorporating fine-grained data derived from pragmatic human communication. The authors propose an enrichment of conventional binary preference queries, which typically only ask for preferences between examples, by also inquiring about specific features of those examples. The aim is to develop a more accurate reward model by understanding why a particular example is preferred.

Methodology

The approach hinges on two types of queries: example-level and feature-level. The former aligns with traditional Reinforcement Learning from Human Feedback (RLHF) methodologies, while the latter seeks human input on the specific features that influence their preferences. By combining these queries, the model can infer both explicit preferences and implicit indifferences a user has towards certain features, thus constructing a richer dataset. This feature augmentation, driven by pragmatic language descriptions, marks a significant deviation from existing RLHF methods.

The authors evaluate their method in linear bandit settings across both vision- and language-based domains—specifically in tasks involving vision (mushroom foraging) and language (flight booking). The results demonstrate that their approach achieves more efficient convergence to the true reward function compared to methods only using example-level feedback. These findings are validated by a user study in the mushroom foraging task, which confirms the model's applicability and efficiency in real-world scenarios.

Key Findings

Efficiency: The pragmatic feature preference model requires fewer comparisons to converge to accurate reward predictions, significantly reducing the learning effort compared to traditional RLHF approaches.
Feature Sparsity: The advantage of the proposed method over baseline models is especially pronounced when reward functions are sparse, meaning that only a few features are reward-relevant.
User Study: Real-world validation through a user study indicates that users did not find providing feature-level feedback more burdensome, and the augmented queries did not introduce significant additional effort.

Implications

The implications of this research are twofold. Practically, the enriched preference data can lead to more human-aligned AI systems that efficiently learn from limited data sets. This is particularly beneficial in applications where querying users iteratively for feedback is costly or impractical. Theoretically, it suggests a richer model of human-AI interaction where users are not just treated as oracles providing binary labels but as teachers whose input can guide the learning process more intricately.

Future Directions

The proposed method opens several avenues for future research. A critical exploration could involve the application of the method to more complex, high-dimensional environments, assessing the scalability of the pragmatic approach. Additionally, the current method assumes users provide clear feature-level feedback, which may not always be feasible. Thus, developing mechanisms to handle ambiguity in human communication or leveraging AI to better interpret human input will be valuable advancements.

In conclusion, pragmatic feature preferences introduce a promising dimension to reward function learning, integrating insights from human communication strategies to improve AI alignment with human values efficiently. This approach underscores the potential of pragmatic communication models in advancing AI systems that learn from and adapt to human interactions more subtly and effectively.

Markdown Report Issue