Fairness in Learning: Classic and Contextual Bandits

Published 23 May 2016 in cs.LG and stat.ML | (1605.07139v2)

Abstract: We introduce the study of fairness in multi-armed bandit problems. Our fairness definition can be interpreted as demanding that given a pool of applicants (say, for college admission or mortgages), a worse applicant is never favored over a better one, despite a learning algorithm's uncertainty over the true payoffs. We prove results of two types. First, in the important special case of the classic stochastic bandits problem (i.e., in which there are no contexts), we provide a provably fair algorithm based on "chained" confidence intervals, and provide a cumulative regret bound with a cubic dependence on the number of arms. We further show that any fair algorithm must have such a dependence. When combined with regret bounds for standard non-fair algorithms such as UCB, this proves a strong separation between fair and unfair learning, which extends to the general contextual case. In the general contextual case, we prove a tight connection between fairness and the KWIK (Knows What It Knows) learning model: a KWIK algorithm for a class of functions can be transformed into a provably fair contextual bandit algorithm, and conversely any fair contextual bandit algorithm can be transformed into a KWIK learning algorithm. This tight connection allows us to provide a provably fair algorithm for the linear contextual bandit problem with a polynomial dependence on the dimension, and to show (for a different class of functions) a worst-case exponential gap in regret between fair and non-fair learning algorithms

Abstract PDF Upgrade to Chat

Authors (4)

Citations (456)

View on Semantic Scholar

Summary

The paper introduces a fair algorithm for classic bandits using chained confidence intervals, resulting in a cumulative regret bound with cubic dependence on the number of arms.
It establishes a tight connection between fairness and the KWIK learning model, converting KWIK algorithms into fair contextual bandit strategies with polynomial regret in certain settings.
The study highlights practical implications for reducing discrimination in decision-making systems such as college admissions, lending, and hiring without sacrificing learning efficiency.

Fairness in Learning: Classic and Contextual Bandits

The paper "Fairness in Learning: Classic and Contextual Bandits" addresses the integration of fairness constraints into multi-armed bandit (MAB) models. It introduces a fairness definition centered on a principle where a worse applicant should never be preferred over a better one, even amidst uncertainty in learning the true rewards. This concept is explored within both classic stochastic and contextual bandit settings.

Key Contributions

The authors present several major findings that elucidate the trade-offs between fairness and regret in learning algorithms:

Classic Stochastic Bandits: For the special case without contexts, a fair algorithm based on chained confidence intervals is proposed. This method yields a cumulative regret bound with a cubic dependence on the number of arms. The paper illustrates that this dependency is unavoidable for any fair algorithm, providing a clear distinction between the regret bounds achievable by fair and non-fair algorithms.
Contextual Bandits: The paper establishes a tight connection between fairness and the KWIK (Knows What It Knows) learning model. It demonstrates that a KWIK algorithm for a class of functions can be converted into a fair contextual bandit algorithm, and vice versa. This link allows for constructing fair algorithms with polynomial regret in certain settings, such as linear contextual bandits, while showing possible exponential gaps in regret for others.

Numerical Results and Claims

A significant claim made is that in the classic bandit case, non-trivial regret for fair algorithms becomes achievable only after $\Omega(k^3)$ rounds, where $k$ is the number of arms. This is in contrast to standard algorithms without fairness constraints, which achieve non-trivial regret after only $O(k)$ rounds.

Furthermore, the paper identifies scenarios within the contextual bandit setting where fairness imposes an exponential cost. Specifically, when the target functions are conjunctions, fair algorithms face a substantial challenge, evidenced by a worst-case regret of exponential order relative to problem dimension $d$ .

Implications and Future Directions

The exploration of fairness in MAB problems bridges a critical gap in the deployment of learning algorithms in socially sensitive domains like college admissions, lending, and hiring. The findings have several implications:

Theoretical Implications: The paper enriches the understanding of the trade-offs between fairness constraints and learning efficiency. The methodologies proposed for fair bandits could influence future developments in algorithm fairness across other reinforcement learning models.
Practical Implications: In deployable systems, the algorithms developed offer a framework that helps mitigate the risk of discrimination while maintaining competitive performance metrics. This could play a vital role in applications requiring ethical considerations in automated decision-making.
Future Research: There is potential to explore more complex decision-making scenarios, including those with dynamic environments or involving multiple fairness definitions. Additionally, extending these models to deep reinforcement learning paradigms presents an intriguing avenue of research.

Conclusion

The paper contributes a novel perspective on integrating fairness within MAB frameworks and lays groundwork for further exploration of ethical considerations in machine learning. By establishing a robust connection to the KWIK model, the authors provide a foundation for designing algorithms that accommodate fairness without excessively compromising on learning efficiency. This work is not only technically sound but also foundational for applications where fairness is paramount.

Markdown Report Issue