A Survey of Reinforcement Learning from Human Feedback (2312.14925v2)
Abstract: Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of LLMs has impressively demonstrated this potential in recent years, where RLHF played a decisive role in directing the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between RL agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.
- Youssef Abdelkareem, Shady Shehata and Fakhri Karray “Advances in Preference-based Reinforcement Learning: A Review” In Proceedings of IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2022, pp. 2527–2532 DOI: 10.1109/SMC53654.2022.9945333
- “Improving Multimodal Interactive Agents with Reinforcement Learning from Human Feedback”, 2022 arXiv:2211.11602
- Riad Akrour, Marc Schoenauer and Michele Sebag “Preference-Based Policy Learning” In Proceedings of Machine Learning and Knowledge Discovery in Databases (ECML PKDD) Springer, 2011, pp. 12–27 DOI: 10.1007/978-3-642-23780-5_11
- Mayer Alvo and Philip L.H. Yu “Statistical Methods for Ranking Data” Springer, 2014 DOI: 10.1007/978-1-4939-1471-5
- “Concrete Problems in AI Safety”, 2016 arXiv:1606.06565
- “Direct Preference-based Policy Optimization without Reward Modeling”, 2023 URL: https://openreview.net/forum?id=FkAwlqBuyO
- “A Survey of Inverse Reinforcement Learning: Challenges, Methods and Progress” In Artificial Intelligence 297, 2021, pp. 103500 DOI: 10.1016/j.artint.2021.103500
- Christian Arzate Cruz and Takeo Igarashi “A Survey on Interactive Reinforcement Learning: Design Principles and Open Challenges” In Proceedings of the ACM Designing Interactive Systems Conference (DIS) Association for Computing Machinery, 2020, pp. 1195–1209 DOI: 10.1145/3357236.3395525
- “A General Language Assistant as a Laboratory for Alignment”, 2021 arXiv:2112.00861
- “Towards Psychology-Aware Preference Construction in Recommender Systems: Overview and Research Issues” In Journal of Intelligent Information Systems 57.3, 2021, pp. 467–489 DOI: 10.1007/s10844-021-00674-5
- “A General Theoretical Paradigm to Understand Learning from Human Preferences”, 2023 arXiv:2310.12036
- “Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback”, 2022 arXiv:2204.05862
- “Constitutional AI: Harmlessness from AI Feedback”, 2022 arXiv:2212.08073
- “Emergent Tool Use From Multi-Agent Autocurricula” In Proceedings of International Conference on Learning Representations (ICLR), 2020 URL: https://openreview.net/forum?id=SkxpxJBKwS
- “Fine-Tuning Language Models to Find Agreement among Humans with Diverse Preferences” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 38176–38189 URL: https://proceedings.neurips.cc/paper/2022/hash/f978c8f3b5f399cae464e85f72e28503-Abstract-Conference.html
- “Active Reward Learning from Multiple Teachers”, 2023
- “Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI” In Information Fusion 58, 2020, pp. 82–115 DOI: 10.1016/j.inffus.2019.12.012
- “Active Learning of Reward Dynamics from Hierarchical Queries” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2019, pp. 120–127 DOI: 10.1109/IROS40897.2019.8968522
- Chandrayee Basu, Mukesh Singhal and Anca D. Dragan “Learning from Richer Human Guidance: Augmenting Comparison-Based Learning with Feature Queries” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2018, pp. 132–140 DOI: 10.1145/3171221.3171284
- “Do You Want Your Autonomous Car To Drive Like You?” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2017, pp. 417–425 DOI: 10.1145/2909824.3020250
- “Preference-Based Online Learning with Dueling Bandits: A Survey” In Journal of Machine Learning Research 22.7, 2021, pp. 1–108 URL: http://jmlr.org/papers/v22/18-546.html
- “Preselection Bandits” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2020, pp. 778–787 URL: https://proceedings.mlr.press/v119/bengs20a.html
- Viktor Bengs, Aadirupa Saha and Eyke Hüllermeier “Stochastic Contextual Dueling Bandits under Linear Stochastic Transitivity Models” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2022, pp. 1764–1786 URL: https://proceedings.mlr.press/v162/bengs22a.html
- Tom Bewley, Jonathan Lawry and Arthur Richards “Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback”, 2023 arXiv:2305.16924
- “Reward Learning with Trees: Methods and Evaluation”, 2022 arXiv:2210.01007
- “Interpretable Preference-based Reinforcement Learning with Tree-Structured Reward Functions” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2022, pp. 118–126 URL: https://www.ifaamas.org/Proceedings/aamas2022/pdfs/p118.pdf
- “A Conceptual Framework for Externally-Influenced Agents: An Assisted Reinforcement Learning Review” In Journal of Ambient Intelligence and Humanized Computing, 2021, pp. 3621–3644 DOI: 10.1007/s12652-021-03489-y
- “Active Preference-Based Gaussian Process Regression for Reward Learning” In Proceedings of Robotics: Science and Systems (RSS) 16, 2020 URL: http://www.roboticsproceedings.org/rss16/p041.html
- “Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences” In The International Journal of Robotics Research 41.1 SAGE Publications Ltd STM, 2022, pp. 45–67 DOI: 10.1177/02783649211041652
- “Asking Easy Questions: A User-Friendly Approach to Active Reward Learning” In Proceedings of the Conference on Robot Learnin (CoRL) PMLR, 2020, pp. 1177–1190 URL: https://proceedings.mlr.press/v100/b-iy-ik20a.html
- “Batch Active Preference-Based Learning of Reward Functions” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2018, pp. 519–528 URL: https://proceedings.mlr.press/v87/biyik18a.html
- Erdem Bıyık, Aditi Talati and Dorsa Sadigh “APReL: A Library for Active Preference-based Reward Learning Algorithms” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2022, pp. 613–617 DOI: 10.1109/HRI53351.2022.9889650
- “Quantifying Hypothesis Space Misspecification in Learning From Human–Robot Demonstrations and Physical Corrections” In IEEE Transactions on Robotics 36.3, 2020, pp. 835–854 DOI: 10.1109/TRO.2020.2971415
- “SIRL: Similarity-based Implicit Representation Learning” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2023, pp. 565–574 DOI: 10.1145/3568162.3576989
- “LESS Is More: Rethinking Probabilistic Models of Human Behavior” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI) Association for Computing Machinery, 2020, pp. 429–437 DOI: 10.1145/3319502.3374811
- “Inducing Structure in Reward Learning by Learning Features” In The International Journal of Robotics Research 41.5 SAGE Publications Ltd STM, 2022, pp. 497–518 DOI: 10.1177/02783649221078031
- “Settling the Reward Hypothesis”, 2022 arXiv:2212.10420
- Ralph Allan Bradley and Milton E. Terry “Rank Analysis of Incomplete Block Designs: I. The Method of Paired Comparisons” In Biometrika 39.3/4 [Oxford University Press, Biometrika Trust], 1952, pp. 324–345 DOI: 10.2307/2334029
- “Safe Imitation Learning via Fast Bayesian Reward Inference from Preferences” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2020, pp. 1165–1177 URL: https://proceedings.mlr.press/v119/brown20a.html
- “Extrapolating Beyond Suboptimal Demonstrations via Inverse Reinforcement Learning from Observations” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2019, pp. 783–792 URL: https://proceedings.mlr.press/v97/brown19a.html
- “Preference-Based Reinforcement Learning: Evolutionary Direct Policy Search Using a Preference-Based Racing Algorithm” In Machine Learning 97.3, 2014, pp. 327–351 DOI: 10.1007/s10994-014-5458-8
- “Scaling Data-Driven Robotics with Reward Sketching and Batch Reinforcement Learning” In Proceedings of Robotics: Science and Systems (RSS) 16, 2020 DOI: 10.15607/RSS.2020.XVI.076
- Haoyang Cao, Samuel Cohen and Lukasz Szpruch “Identifiability in Inverse Reinforcement Learning” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 12362–12373 URL: https://proceedings.neurips.cc/paper/2021/hash/671f0311e2754fcdd37f70a8550379bc-Abstract.html
- Zehong Cao, KaiChiu Wong and Chin-Teng Lin “Weak Human Preference Supervision for Deep Reinforcement Learning” In IEEE Transactions on Neural Networks and Learning Systems 32.12, 2021, pp. 5369–5378 DOI: 10.1109/TNNLS.2021.3084198
- “Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback”, 2023 arXiv:2307.15217
- “trlX: A Scalable Framework for RLHF”, 2023 Zenodo DOI: 10.5281/zenodo.8076391
- Manuela Cattelan “Models for Paired Comparison Data: A Review with Emphasis on Dependent Data” In Statistical Science 27.3 Institute of Mathematical Statistics, 2012, pp. 412–433 DOI: 10.1214/12-STS396
- Lawrence Chan, Andrew Critch and Anca Dragan “Human Irrationality: Both Bad and Good for Reward Inference”, 2021 arXiv:2111.06956
- “On the Theory of Reinforcement Learning with Once-per-Episode Feedback” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 3401–3412 URL: https://proceedings.neurips.cc/paper/2021/hash/1bf2efbbe0c49b9f567c2e40f645279a-Abstract.html
- Jiaao Chen, Mohan Dodda and Diyi Yang “Human-in-the-Loop Abstractive Dialogue Summarization” In Findings of the Association for Computational Linguistics (ACL) Association for Computational Linguistics, 2023, pp. 9176–9190 DOI: 10.18653/v1/2023.findings-acl.584
- “Human Decision Making and Recommender Systems” In ACM Transactions on Interactive Intelligent Systems 3.3, 2013, pp. 17:1–17:7 DOI: 10.1145/2533670.2533675
- “Human-in-the-Loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2022, pp. 3773–3793 URL: https://proceedings.mlr.press/v162/chen22ag.html
- “Preference-Based Policy Iteration: Leveraging Preference Learning for Reinforcement Learning” In Proceedings of Machine Learning and Knowledge Discovery in Databases (ECML PKDD) Springer, 2011, pp. 312–327 DOI: 10.1007/978-3-642-23780-5_30
- “Nonparametric Bayesian Inverse Reinforcement Learning for Multiple Reward Functions” In Advances in Neural Information Processing Systems (NIPS) 25 Curran Associates, Inc., 2012, pp. 314–322 URL: https://proceedings.neurips.cc/paper/2012/hash/140f6969d5213fd0ece03148e62e461e-Abstract.html
- Sayak Ray Chowdhury and Xingyu Zhou “Differentially Private Reward Estimation from Preference Based Feedback”, 2023 URL: https://openreview.net/forum?id=TqzYmBPSGC
- Paul Christiano “Semi-Supervised Reinforcement Learning”, 2016 Medium URL: https://ai-alignment.com/semi-supervised-reinforcement-learning-cf7d5375197f
- Paul Christiano “Thoughts on the Impact of RLHF Research”, 2023 URL: https://www.alignmentforum.org/posts/vwu4kegAEZTBtpT6p/thoughts-on-the-impact-of-rlhf-research
- “Deep Reinforcement Learning from Human Preferences” In Advances in Neural Information Processing Systems (NIPS) 30 Curran Associates, Inc., 2017, pp. 4299–4307 URL: https://proceedings.neurips.cc/paper/2017/hash/d5e2c0adad503c91f91df240d0cd4e49-Abstract.html
- “Faulty Reward Functions in the Wild”, 2016 OpenAI Blog URL: https://openai.com/blog/faulty-reward-functions/
- “Safety-Aware Preference-Based Learning for Safety-Critical Control” In Proceedings of the Annual Learning for Dynamics and Control Conference (L4DC) PMLR, 2022, pp. 1020–1033 URL: https://proceedings.mlr.press/v168/cosner22a.html
- “Active Reward Learning from Critiques” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 6907–6914 DOI: 10.1109/ICRA.2018.8460854
- “The EMPATHIC Framework for Task Learning from Implicit Human Feedback” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2021, pp. 604–626 URL: https://proceedings.mlr.press/v155/cui21a.html
- “Safe RLHF: Safe Reinforcement Learning from Human Feedback”, 2023 arXiv:2310.12773
- “Active Reward Learning” In Proceedings of Robotics: Science and Systems (RSS) 10, 2014 URL: http://www.roboticsproceedings.org/rss10/p31.html
- “The Expertise Problem: Learning from Specialized Feedback”, 2022 URL: https://openreview.net/forum?id=I7K975-H1Mg
- “Ordering Effects and Choice Set Awareness in Repeat-Response Stated Preference Studies” In Journal of Environmental Economics and Management 63.1, 2012, pp. 73–91 DOI: 10.1016/j.jeem.2011.09.001
- “Bridging the Gap between Regret Minimization and Best Arm Identification, with Application to A/B Tests” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR, 2019, pp. 1988–1996 URL: https://proceedings.mlr.press/v89/degenne19a.html
- “Learning a Universal Human Prior for Dexterous Manipulation from Human Preference”, 2023 arXiv:2304.04602
- “AlignDiff: Aligning Diverse Human Preferences via Behavior-Customisable Diffusion Model”, 2023 arXiv:2310.02054
- “Vision-Language Models as Success Detectors”, 2023 arXiv:2303.07280
- “Inverse Optimal Control with Linearly-Solvable MDPs” In Proceedings of the International Conference on Machine Learning (ICML) Omnipress, 2010, pp. 335–342 URL: https://icml.cc/Conferences/2010/papers/571.pdf
- Cynthia Dwork “Differential Privacy: A Survey of Results” In Proceedings of Theory and Applications of Models of Computation (TAMC) Springer, 2008, pp. 1–19 DOI: 10.1007/978-3-540-79228-4_1
- “Non-Markovian Reward Modelling from Trajectory Labels via Interpretable Multiple Instance Learning” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 27652–27663 URL: https://proceedings.neurips.cc/paper/2022/hash/b157cfde6794e93b2353b9712bbd45a5-Abstract-Conference.html
- “Actively Learning Costly Reward Functions for Reinforcement Learning”, 2022 URL: https://openreview.net/forum?id=eFHNEv6G9fF
- “INQUIRE: INteractive Querying for User-aware Informative REasoning” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2023, pp. 2241–2250 URL: https://proceedings.mlr.press/v205/fitzgerald23a.html
- Floyd J. Fowler “Survey Research Methods” SAGE Publications, 2013
- Rachel Freedman, Rohin Shah and Anca Dragan “Choice Set Misspecification in Reward Inference” In Proceedings of the Workshop on Artificial Intelligence Safety 2020 Co-Located with the 29th International Joint Conference on Artificial Intelligence and the 17th Pacific Rim International Conference on Artificial Intelligence 2640 CEUR, 2021 URL: https://ceur-ws.org/Vol-2640/#paper_14
- “Active Teacher Selection for Reinforcement Learning from Human Feedback”, 2023 arXiv:2310.15288
- “DERAIL: Diagnostic Environments for Reward And Imitation Learning”, 2020
- Justin Fu, Katie Luo and Sergey Levine “Learning Robust Rewards with Adverserial Inverse Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2018 URL: https://openreview.net/forum?id=rkHywl-A-
- “Variational Inverse Control with Events: A General Framework for Data-Driven Reward Definition” In Advances in Neural Information Processing Systems (NIPS) 31 Curran Associates, Inc., 2018, pp. 8547–8556 URL: https://proceedings.neurips.cc/paper/2018/hash/c9319967c038f9b923068dabdf60cfe3-Abstract.html
- Scott Fujimoto, Herke Hoof and David Meger “Addressing Function Approximation Error in Actor-Critic Methods” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2018, pp. 1587–1596 URL: https://proceedings.mlr.press/v80/fujimoto18a.html
- “Preference-Based Reinforcement Learning: A Formal Framework and a Policy Iteration Algorithm” In Machine Learning 89.1, 2012, pp. 123–156 DOI: 10.1007/s10994-012-5313-8
- R.Michael Furr “Psychometrics: An Introduction” SAGE Publications, 2021
- Iason Gabriel “Artificial Intelligence, Values, and Alignment” In Minds and Machines 30.3, 2020, pp. 411–437 DOI: 10.1007/s11023-020-09539-2
- Leo Gao, John Schulman and Jacob Hilton “Scaling Laws for Reward Model Overoptimization” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 10835–10866 URL: https://proceedings.mlr.press/v202/gao23h.html
- Yang Gao, Christian M. Meyer and Iryna Gurevych “APRIL: Interactively Learning to Summarise by Combining Active Preference Learning and Reinforcement Learning” In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP) Association for Computational Linguistics, 2018, pp. 4120–4130 DOI: 10.18653/v1/D18-1445
- Yang Gao, Christian M. Meyer and Iryna Gurevych “Preference-Based Interactive Multi-Document Summarisation” In Information Retrieval Journal 23.6, 2020, pp. 555–585 DOI: 10.1007/s10791-019-09367-8
- “DeepMDP: Learning Continuous Latent Space Models for Representation Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2019, pp. 2170–2179 URL: https://proceedings.mlr.press/v97/gelada19a.html
- Hans-Otto Georgii “Gibbs Measures and Phase Transitions” Walter de Gruyter, 2011
- “The Effect of Modeling Human Rationality Level on Learning Rewards from Multiple Feedback Types” In Proceedings of the AAAI Conference on Artificial Intelligence 37.5, 2023, pp. 5983–5992 DOI: 10.1609/aaai.v37i5.25740
- “Reducing the Number of Queries in Interactive Value Iteration” In Proceedings of Algorithmic Decision Theory (ADT) Springer International Publishing, 2015, pp. 139–152 DOI: 10.1007/978-3-319-23114-3_9
- “Quantile Reinforcement Learning”, 2016
- Hugo Gilbert, Paul Weng and Yan Xu “Optimizing Quantiles in Preference-Based Markov Decision Processes” In Proceedings of the AAAI Conference on Artificial Intelligence 31.1, 2017, pp. 3569–3575 DOI: 10.1609/aaai.v31i1.11026
- “Model-Free Reinforcement Learning with Skew-Symmetric Bilinear Utilities” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2016, pp. 252–261 URL: http://auai.org/uai2016/proceedings/papers/91.pdf
- “Improving Alignment of Dialogue Agents via Targeted Human Judgements”, 2022 arXiv:2209.14375
- “A Survey on Interpretable Reinforcement Learning”, 2022 arXiv:2112.13112
- “Quantifying Differences in Reward Functions” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://iclr.cc/virtual/2021/poster/3348
- “Uncertainty Estimation for Language Reward Models”, 2022 arXiv:2203.07472
- “Imitation: Clean Imitation Learning Implementations”, 2022 arXiv:2211.11972
- “Widening the Pipeline in Human-Guided Reinforcement Learning with Explanation and Context-Aware Data Augmentation” In Advances in Neural Information Processing Systems (NeurIPS), 2021, pp. 21885–21897 URL: https://proceedings.neurips.cc/paper/2021/hash/b6f8dc086b2d60c5856e4ff517060392-Abstract.html
- Faruk Gul, Paulo Natenzon and Wolfgang Pesendorfer “Random Choice as Behavioral Optimization” In Econometrica 82.5, 2014, pp. 1873–1912 DOI: 10.3982/ECTA10621
- “Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2018, pp. 1861–1870 URL: https://proceedings.mlr.press/v80/haarnoja18b.html
- Soheil Habibian, Ananth Jonnavittula and Dylan P. Losey “Here’s What I’ve Learned: Asking Questions That Reveal Reward Learning” In ACM Transactions on Human-Robot Interaction 11.4, 2022, pp. 40:1–40:28 DOI: 10.1145/3526107
- “Testification of Condorcet Winners in Dueling Bandits” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) PMLR, 2021, pp. 1195–1205 URL: https://proceedings.mlr.press/v161/haddenhorst21a.html
- Björn Haddenhorst, Eyke Hüllermeier and Martin Kolb “Generalized Transitivity: A Systematic Comparison of Concepts with an Application to Preferences in the Babington Smith Model” In International Journal of Approximate Reasoning 119, 2020, pp. 373–407 DOI: 10.1016/j.ijar.2020.01.007
- “The Off-Switch Game” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) International Joint Conferences on Artificial Intelligence Organization, 2017, pp. 220–227 DOI: 10.24963/ijcai.2017/32
- “Inverse Reward Design” In Advances in Neural Information Processing Systems (NIPS) 30 Curran Associates, Inc., 2017, pp. 6765–6774 URL: https://proceedings.neurips.cc/paper/2017/hash/32fdab6559cdfa4f167f8c31b9199643-Abstract.html
- “Dream to Control: Learning Behaviors by Latent Imagination” In Proceedings of International Conference on Learning Representations (ICLR), 2020 URL: https://openreview.net/forum?id=S1lOTC4tDS
- “Mastering Diverse Domains through World Models”, 2023 arXiv:2301.04104
- “Methodological Reflections for AI Alignment Research Using Human Feedback”, 2022 arXiv:2301.06859
- Jerry Zhi-Yang He and Anca D. Dragan “Assisted Robust Reward Design” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2022, pp. 1234–1246 URL: https://proceedings.mlr.press/v164/he22a.html
- Donald Joseph Hejna and Dorsa Sadigh “Few-Shot Preference Learning for Human-in-the-Loop RL” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2023, pp. 2014–2025 URL: https://proceedings.mlr.press/v205/iii23a.html
- “Inverse Preference Learning: Preference-based RL without a Reward Function”, 2023 URL: https://openreview.net/forum?id=ut9y3udeAo
- “Rainbow: Combining Improvements in Deep Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence 32.1, 2018, pp. 3215–3222 DOI: 10.1609/aaai.v32i1.11796
- Matthew Hoffman, Eric Brochu and Nando Freitas “Portfolio Allocation for Bayesian Optimization” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) AUAI Press, 2011, pp. 327–336 DOI: 10.5555/3020548.3020587
- “Active Comparison Based Learning Incorporating User Uncertainty and Noise”, 2016
- “GAN-Based Interactive Reinforcement Learning from Demonstration and Human Evaluative Feedback” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 4991–4998 DOI: 10.1109/ICRA48891.2023.10160939
- “Reward Learning from Human Preferences and Demonstrations in Atari” In Advances in Neural Information Processing Systems (NIPS) 31 Curran Associates, Inc., 2018, pp. 8022–8034 URL: https://proceedings.neurips.cc/paper/2018/hash/8cbe9ce23f42628c98f80fa0fac8b19a-Abstract.html
- “Off-Policy Evaluation via Off-Policy Classification” In Advances in Neural Information Processing Systems (NeurIPS) 32 Curran Associates, Inc., 2019, pp. 5438–5449 URL: https://proceedings.neurips.cc/paper/2019/hash/b5b03f06271f8917685d14cea7c6c50a-Abstract.html
- “A Social Reinforcement Learning Agent” In Proceedings of the International Conference on Autonomous Agents (AGENTS) Association for Computing Machinery, 2001, pp. 377–384 DOI: 10.1145/375735.376334
- “Learning Preferences for Manipulation Tasks from Online Coactive Feedback” In The International Journal of Robotics Research 34.10 SAGE Publications Ltd STM, 2015, pp. 1296–1313 DOI: 10.1177/0278364915581193
- “Preprocessing Reward Functions for Interpretability”, 2021
- Erik Jenner, Joar Max Viktor Skalse and Adam Gleave “A General Framework for Reward Function Distances”, 2022 URL: https://openreview.net/forum?id=Hn21kZHiCK
- Hong Jun Jeon, Smitha Milli and Anca Dragan “Reward-Rational (Implicit) Choice: A Unifying Formalism for Reward Learning” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 4415–4426 URL: https://proceedings.neurips.cc/paper/2020/hash/2f10c1578a0706e06b6d7db6f0b4a6af-Abstract.html
- “AI Alignment: A Comprehensive Survey”, 2023 arXiv:2310.19852
- “Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems”, 2023 arXiv:2307.12975
- “Doubly Robust Off-policy Value Evaluation for Reinforcement Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2016, pp. 652–661 URL: https://proceedings.mlr.press/v48/jiang16.html
- “Provably Efficient Reinforcement Learning with Linear Function Approximation” In Proceedings of the Conference on Learning Theory (COLT) PMLR, 2020, pp. 2137–2143 URL: https://proceedings.mlr.press/v125/jin20a.html
- Ying Jin, Zhuoran Yang and Zhaoran Wang “Is Pessimism Provably Efficient for Offline RL?” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2021, pp. 5084–5096 URL: https://proceedings.mlr.press/v139/jin21e.html
- Gregory Kahn, Pieter Abbeel and Sergey Levine “LaND: Learning to Navigate From Disengagements” In IEEE Robotics and Automation Letters 6.2, 2021, pp. 1872–1879 DOI: 10.1109/LRA.2021.3060404
- Akansha Kalra and Daniel S. Brown “Interpretable Reward Learning via Differentiable Decision Trees”, 2022 URL: https://openreview.net/forum?id=3bk40MsYjet
- Akansha Kalra and Daniel S. Brown “Can Differentiable Decision Trees Learn Interpretable Reward Functions?”, 2023 arXiv:2306.13004
- “Beyond Reward: Offline Preference-guided Policy Optimization” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 15753–15768 URL: https://proceedings.mlr.press/v202/kang23b.html
- “Preference-Based Learning of Reward Function Features”, 2021 arXiv:2103.02727
- “On the Challenges and Practices of Reinforcement Learning from Real Human Feedback”, 2023
- Hadi Kazemi, Fariborz Taherkhani and Nasser M. Nasrabadi “Preference-Based Image Generation” In Proceedings of IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2020, pp. 3393–3402 DOI: 10.1109/WACV45572.2020.9093406
- “Preference Transformer: Modeling Human Preferences Using Transformers for RL” In Proceedings of International Conference on Learning Representations (ICLR), 2023 URL: https://openreview.net/forum?id=Peot1SFDX0
- “Reward Identification in Inverse Reinforcement Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2021, pp. 5496–5505 URL: https://proceedings.mlr.press/v139/kim21c.html
- “Researching Alignment Research: Unsupervised Analysis”, 2022 arXiv:2206.02841
- W.Bradley Knox “Learning from Human-generated Reward”, 2012 URL: https://repositories.lib.utexas.edu/items/20b9e8a1-a78d-4844-816f-3c0b0a4c848a
- “Reward (Mis)Design for Autonomous Driving” In Artificial Intelligence 316, 2023, pp. 103829 DOI: 10.1016/j.artint.2022.103829
- “Models of Human Preference for Learning Reward Functions”, 2022 arXiv:2206.02231
- “TAMER: Training an Agent Manually via Evaluative Reinforcement” In Proceedings of the IEEE International Conference on Development and Learning (ICDL), 2008, pp. 292–297 DOI: 10.1109/DEVLRN.2008.4640845
- “Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 11063–11078 URL: https://proceedings.neurips.cc/paper/2022/hash/476c289f685e27936aa089e9d53a4213-Abstract-Conference.html
- Pallavi Koppol, Henny Admoni and Reid Simmons “Iterative Interactive Reward Learning”, 2020
- Anna Korba, Stéphan Clemencon and Eric Sibony “A Learning Theory of Ranking Aggregation” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR, 2017, pp. 1001–1010 URL: https://proceedings.mlr.press/v54/korba17a.html
- Samantha Krening and Karen M. Feigh “Interaction Algorithm Effect on Human Experience with Reinforcement Learning” In ACM Transactions on Human-Robot Interaction 7.2, 2018, pp. 16:1–16:22 DOI: 10.1145/3277904
- Andras Kupcsik, David Hsu and Wee Sun Lee “Learning Dynamic Robot-to-Human Object Handover from Human Feedback” In Robotics Research: Volume 1, Springer Proceedings in Advanced Robotics Springer International Publishing, 2018, pp. 161–176 DOI: 10.1007/978-3-319-51532-8_10
- Nathan Lambert “Reward Is Not Enough”, 2021 Democratizing Automation URL: https://robotic.substack.com/p/reward-is-not-enough
- “Reinforcement Learning with Augmented Data” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 19884–19895 URL: https://proceedings.neurips.cc/paper/2020/hash/e615c82aba461681ade82da2da38004a-Abstract.html
- “Bandit Algorithms” Cambridge University Press, 2020 DOI: 10.1017/9781108571401
- Hoang Le, Cameron Voloshin and Yisong Yue “Batch Policy Learning under Constraints” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2019, pp. 3703–3712 URL: https://proceedings.mlr.press/v97/le19a.html
- “Network Randomization: A Simple Technique for Generalization in Deep Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2020 URL: https://openreview.net/forum?id=HJgcvJBFvB
- “Aligning Text-to-Image Models Using Human Feedback”, 2023 arXiv:2302.12192
- “B-Pref: Benchmarking Preference-Based Reinforcement Learning”, 2021 URL: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/hash/d82c8d1619ad8176d665453cfb2e55f0-Abstract-round1.html
- Kimin Lee, Laura M. Smith and Pieter Abbeel “PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2021, pp. 6152–6163 URL: https://proceedings.mlr.press/v139/lee21i.html
- “The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities” In Artificial Life 26.2, 2020, pp. 274–306 DOI: 10.1162/artl_a_00319
- “Scalable Agent Alignment via Reward Modeling: A Research Direction”, 2018 arXiv:1811.07871
- “Learning Human Objectives from Sequences of Physical Corrections” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2021, pp. 2877–2883 DOI: 10.1109/ICRA48506.2021.9560829
- Zihao Li, Zhuoran Yang and Mengdi Wang “Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism”, 2023 URL: https://openreview.net/forum?id=gxM2AUFMsK
- “Reward Uncertainty for Exploration in Preference-based Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=OWZVD-l-ZrC
- “The Construction of Preference” Cambridge University Press, 2006 DOI: 10.1017/CBO9780511618031
- “A Review on Interactive Reinforcement Learning From Human Social Feedback” In IEEE Access 8, 2020, pp. 120757–120765 DOI: 10.1109/ACCESS.2020.3006254
- Stephanie Lin, Jacob Hilton and Owain Evans “TruthfulQA: Measuring How Models Mimic Human Falsehoods” In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (ACL) Association for Computational Linguistics, 2022, pp. 3214–3252 DOI: 10.18653/v1/2022.acl-long.229
- “Invariant Transform Experience Replay: Data Augmentation for Deep Reinforcement Learning” In IEEE Robotics and Automation Letters 5.4, 2020, pp. 6615–6622 DOI: 10.1109/LRA.2020.3013937
- “Humans Are Not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning”, 2022
- “Information Directed Reward Learning for Reinforcement Learning” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 3850–3862 URL: https://proceedings.neurips.cc/paper/2021/hash/1fa6269f58898f0e809575c9a48747ef-Abstract.html
- “Zero-Shot Preference Learning for Offline RL via Optimal Transport”, 2023 arXiv:2306.03615
- “Summary of ChatGPT-Related Research and Perspective towards the Future of Large Language Models” In Meta-Radiology 1.2, 2023, pp. 100017 DOI: 10.1016/j.metrad.2023.100017
- “Physical Interaction as Communication: Learning Robot Objectives Online from Human Corrections” In The International Journal of Robotics Research 41.1 SAGE Publications Ltd STM, 2022, pp. 20–44 DOI: 10.1177/02783649211050958
- Dylan P. Losey and Marcia K. O’Malley “Including Uncertainty When Learning from Human Corrections” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2018, pp. 123–132 URL: http://proceedings.mlr.press/v87/losey18a.html
- R.Duncan Luce “Individual Choice Behavior”, Individual Choice Behavior John Wiley, 1959, pp. xii\bibrangessep153
- “A Survey of Reinforcement Learning Informed by Natural Language” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) International Joint Conferences on Artificial Intelligence Organization, 2019, pp. 6309–6317 DOI: 10.24963/ijcai.2019/880
- “Eureka: Human-Level Reward Design via Coding Large Language Models”, 2023 arXiv:2310.12931
- “Consumer Decision Making in Knowledge-Based Recommendation” In Journal of Intelligent Information Systems 37.1, 2011, pp. 1–22 DOI: 10.1007/s10844-010-0134-3
- “Risk Bounds for Statistical Learning” In The Annals of Statistics 34.5 Institute of Mathematical Statistics, 2006, pp. 2326–2366 DOI: 10.1214/009053606000000786
- “On The Fragility of Learned Reward Functions”, 2022 URL: https://openreview.net/forum?id=9gj9vXfeS-y
- Jorge Mendez, Shashank Shivkumar and Eric Eaton “Lifelong Inverse Reinforcement Learning” In Advances in Neural Information Processing Systems (NIPS) 31 Curran Associates, Inc., 2018, pp. 4507–4518 URL: https://papers.nips.cc/paper/2018/hash/2d969e2cee8cfa07ce7ca0bb13c7a36d-Abstract.html
- Katherine Metcalf, Miguel Sarabia and Barry-John Theobald “Rewards Encoding Environment Dynamics Improves Preference-based Reinforcement Learning”, 2022 arXiv:2211.06527
- “RLHF-Blender: A Configurable Interactive Interface for Learning from Diverse Human Feedback”, 2023 URL: https://openreview.net/forum?id=JvkZtzJBFQ
- “Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition” In Proceedings of the NeurIPS 2022 Competitions Track PMLR, 2022, pp. 171–188 URL: https://proceedings.mlr.press/v220/milani22a.html
- “Explainable Reinforcement Learning: A Survey and Comparative Review” In ACM Computing Surveys, 2023 DOI: 10.1145/3616864
- Smitha Milli and Anca D. Dragan “Literal or Pedagogic Human? Analyzing Human Model Misspecification in Objective Learning” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) PMLR, 2020, pp. 925–934 URL: https://proceedings.mlr.press/v115/milli20a.html
- “Active Inverse Reward Design”, 2018
- “Human-Level Control through Deep Reinforcement Learning” In Nature 518.7540 Nature Publishing Group, 2015, pp. 529–533 DOI: 10.1038/nature14236
- “On Huber’s Contaminated Model” In Journal of Complexity 77, 2023, pp. 101745 DOI: 10.1016/j.jco.2023.101745
- “Learning Multimodal Rewards from Rankings” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2022, pp. 342–352 URL: https://proceedings.mlr.press/v164/myers22a.html
- Vivek Myers, Erdem Bıyık and Dorsa Sadigh “Active Reward Learning from Online Preferences” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 7511–7518 DOI: 10.1109/ICRA48891.2023.10160439
- “Reinforcement Learning With Human Advice: A Survey” In Frontiers in Robotics and AI 8, 2021, pp. 584075 DOI: 10.3389/frobt.2021.584075
- “WebGPT: Browser-assisted Question-Answering with Human Feedback”, 2022 arXiv:2112.09332
- “Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey” In Journal of Machine Learning Research 21.181, 2020, pp. 1–50 URL: http://jmlr.org/papers/v21/20-212.html
- “Training Parsers by Inverse Reinforcement Learning” In Machine Learning 77.2, 2009, pp. 303–337 DOI: 10.1007/s10994-009-5110-1
- Andrew Y. Ng, Daishi Harada and Stuart J. Russell “Policy Invariance Under Reward Transformations: Theory and Application to Reward Shaping” In Proceedings of the International Conference on Machine Learning (ICML) Morgan Kaufmann Publishers Inc., 1999, pp. 278–287
- Andrew Y. Ng and Stuart J. Russell “Algorithms for Inverse Reinforcement Learning” In Proceedings of the International Conference on Machine Learning (ICML) Morgan Kaufmann Publishers Inc., 2000, pp. 663–670
- “Dueling Posterior Sampling for Preference-Based Reinforcement Learning” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) PMLR, 2020, pp. 1029–1038 URL: https://proceedings.mlr.press/v124/novoseller20a.html
- OpenAI “ChatGPT: Optimizing Language Models for Dialogue”, 2022 URL: https://openai.com/blog/chatgpt
- OpenAI “GPT-4 Technical Report”, 2023 URL: https://cdn.openai.com/papers/gpt-4.pdf
- “An Algorithmic Perspective on Imitation Learning” In Foundations and Trends® in Robotics 7.1-2 Now Publishers, Inc., 2018, pp. 1–179 DOI: 10.1561/2300000053
- “Training Language Models to Follow Instructions with Human Feedback” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 27730–27744 URL: https://proceedings.neurips.cc/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html
- Cosmin Paduraru “Off-Policy Evaluation in Markov Decision Processes”, 2013 URL: https://escholarship.mcgill.ca/concern/theses/p8418r74h
- “Learning Reward Functions by Integrating Human Demonstrations and Preferences” In Proceedings of Robotics: Science and Systems (RSS) 15, 2019 URL: http://www.roboticsproceedings.org/rss15/p23.html
- “SURF: Semi-supervised Reward Learning with Data Augmentation for Feedback-efficient Preference-based Reinforcement Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=TfhfZLQ2EJO
- “Tuning Computer Vision Models With Task Rewards” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 33229–33239 URL: https://proceedings.mlr.press/v202/susano-pinto23a.html
- R.L. Plackett “The Analysis of Permutations” In Journal of the Royal Statistical Society. Series C (Applied Statistics) 24.2 [Wiley, Royal Statistical Society], 1975, pp. 193–202 DOI: 10.2307/2346567
- “Designing Interfaces for Explicit Preference Elicitation: A User-Centered Investigation of Preference Representation and Elicitation Process” In User Modeling and User-Adapted Interaction 22.4, 2012, pp. 357–397 DOI: 10.1007/s11257-011-9116-6
- “Towards Intrinsic Interactive Reinforcement Learning”, 2022 arXiv:2112.01575
- Doina Precup, Richard S. Sutton and Satinder P. Singh “Eligibility Traces for Off-Policy Policy Evaluation” In Proceedings of the International Conference on Machine Learning (ICML) Morgan Kaufmann Publishers Inc., 2000, pp. 759–766
- Erika Puiutta and Eric M.S.P. Veith “Explainable Reinforcement Learning: A Survey” In Proceedings of Machine Learning and Knowledge Extraction (CD-MAKE) Springer International Publishing, 2020, pp. 77–95 DOI: 10.1007/978-3-030-57321-8_5
- Junqi Qian, Paul Weng and Chenmien Tan “Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2023, pp. 1951–1960 DOI: 10.5555/3545946.3598864
- “A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges”, 2023 arXiv:2211.06665
- Mattia Racca, Antti Oulasvirta and Ville Kyrki “Teacher-Aware Active Robot Learning” In Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI), 2019, pp. 335–343 DOI: 10.1109/HRI.2019.8673300
- “Direct Preference Optimization: Your Language Model Is Secretly a Reward Model”, 2023 URL: https://openreview.net/forum?id=HPuSIXJaa9
- “Safe Deep RL in 3D Environments Using Human Feedback”, 2022 arXiv:2201.08102
- “Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization” In Proceedings of International Conference on Learning Representations (ICLR), 2023 URL: https://openreview.net/forum?id=8aHzds2uUyB
- “Learning Human Objectives by Evaluating Hypothetical Behavior” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2020, pp. 8020–8029 URL: https://proceedings.mlr.press/v119/reddy20a.html
- “Regret-Based Reward Elicitation for Markov Decision Processes” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) AUAI Press, 2009, pp. 444–451 URL: https://dl.acm.org/doi/10.5555/1795114.1795166
- “Robust Online Optimization of Reward-Uncertain MDPs” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) AAAI Press, 2011, pp. 2165–2171 DOI: 10.5591/978-1-57735-516-8/IJCAI11-361
- “Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 15502–15515 URL: https://papers.nips.cc/paper_files/paper/2022/hash/63b2b056f48653b7cff0d8d233c96a4d-Abstract-Conference.html
- Daniel Russo and Benjamin Van Roy “Eluder Dimension and the Sample Complexity of Optimistic Exploration” In Advances in Neural Information Processing Systems (NIPS) 26 Curran Associates, Inc., 2013, pp. 2256–2264 URL: https://papers.nips.cc/paper/2013/hash/41bfd20a38bb1b0bec75acf0845530a7-Abstract.html
- John Rust “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher” In Econometrica 55.5 [Wiley, Econometric Society], 1987, pp. 999–1033 DOI: 10.2307/1911259
- “Using Discrete Choice Experiments to Value Health and Health Care” 11, The Economics of Non-Market Goods and Resources Springer Netherlands, 2008 DOI: 10.1007/978-1-4020-5753-3
- “Active Preference-Based Learning of Reward Functions” In Proceedings of Robotics: Science and Systems (RSS) 13, 2017 URL: http://www.roboticsproceedings.org/rss13/p53.html
- Aadirupa Saha “Optimal Algorithms for Stochastic Contextual Preference Bandits” In Advances in Neural Information Processing Systems (NeurIPS) 34 Curran Associates, Inc., 2021, pp. 30050–30062 URL: https://proceedings.neurips.cc/paper/2021/hash/fc3cf452d3da8402bebb765225ce8c0e-Abstract.html
- Aadirupa Saha, Aldo Pacchiano and Jonathan Lee “Dueling RL: Reinforcement Learning with Trajectory Preferences” In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) PMLR, 2023, pp. 6263–6289 URL: https://proceedings.mlr.press/v206/saha23a.html
- “Trial without Error: Towards Safe Reinforcement Learning via Human Intervention” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2018, pp. 2067–2069 DOI: 10.5555/3237383.3238074
- “Programming by Feedback” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2014, pp. 1503–1511 URL: https://proceedings.mlr.press/v32/schoenauer14.html
- “Proximal Policy Optimization Algorithms”, 2017 arXiv:1707.06347
- “Data-Efficient Reinforcement Learning with Self-Predictive Representations” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=uCQfPZwRaUu
- Burr Settles “Active Learning” Morgan & Claypool Publishers, 2012
- “The MineRL BASALT Competition on Learning from Human Feedback”, 2021 arXiv:2107.01969
- Daniel Shin, Anca Dragan and Daniel S. Brown “Benchmarks and Algorithms for Offline Preference-Based Reward Learning” In Transactions on Machine Learning Research, 2023 URL: https://openreview.net/forum?id=TGuXXlbKsn
- “Online Structured Prediction via Coactive Learning” In Proceedings of the International Conference on Machine Learning (ICML) Omnipress, 2012, pp. 59–66 URL: http://icml.cc/2012/papers/717.pdf
- “Coactive Learning” In Journal of Artificial Intelligence Research 53, 2015, pp. 1–40 DOI: 10.1613/jair.4539
- Umer Siddique, Abhinav Sinha and Yongcan Cao “Fairness in Preference-based Reinforcement Learning”, 2023 URL: https://openreview.net/forum?id=ColATVnkEl
- “Reward Is Enough” In Artificial Intelligence 299, 2021, pp. 103535 DOI: 10.1016/j.artint.2021.103535
- “End-To-End Robotic Reinforcement Learning without Reward Engineering” In Proceedings of Robotics: Science and Systems (RSS) 15, 2019 URL: http://www.roboticsproceedings.org/rss15/p73.html
- Joar Max Viktor Skalse and Alessandro Abate “The Reward Hypothesis Is False”, 2022 URL: https://openreview.net/forum?id=5l1NgpzAfH
- Joar Max Viktor Skalse and Alessandro Abate “Misspecification in Inverse Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence 37.12, 2023, pp. 15136–15143 DOI: 10.1609/aaai.v37i12.26766
- “STARC: A General Framework For Quantifying Differences Between Reward Functions”, 2023 arXiv:2309.15257
- “Invariance in Policy Optimisation and Partial Identifiability in Reward Learning” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 32033–32058 URL: https://proceedings.mlr.press/v202/skalse23a.html
- “Defining and Characterizing Reward Gaming” In Advances in Neural Information Processing Systems (NeurIPS) 35, 2022, pp. 9460–9471 URL: https://proceedings.neurips.cc/paper_files/paper/2022/hash/3d719fee332caa23d5038b8a90e81796-Abstract-Conference.html
- “Reward Collapse in Aligning Large Language Models: A Prompt-Aware Approach to Preference Rankings”, 2023 URL: https://openreview.net/forum?id=dpWxK6aqIK
- “Learning to Summarize with Human Feedback” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 3008–3021 URL: https://proceedings.neurips.cc/paper/2020/hash/1f89885d556929e98d3ef9b86448f951-Abstract.html
- Hiroaki Sugiyama, Toyomi Meguro and Yasuhiro Minami “Preference-Learning Based Inverse Reinforcement Learning for Dialog Control” In Proceedings of Interspeech ISCA, 2012, pp. 222–225 DOI: 10.21437/Interspeech.2012-72
- “Multi-Dueling Bandits with Dependent Arms” In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI) AUAI Press, 2017 URL: http://auai.org/uai2017/proceedings/papers/155.pdf
- “Advancements in Dueling Bandits” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) International Joint Conferences on Artificial Intelligence Organization, 2018, pp. 5502–5510 DOI: 10.24963/ijcai.2018/776
- Richard S. Sutton and Andrew G. Barto “Reinforcement Learning: An Introduction”, Adaptive Computation and Machine Learning Series The MIT Press, 2018
- Louis Leon Thurstone “A Law of Comparative Judgment” In Psychological Review 34 Psychological Review Company, 1927, pp. 273–286 DOI: 10.1037/h0070288
- “Causal Confusion and Reward Misidentification in Preference-Based Reward Learning” In Proceedings of International Conference on Learning Representations (ICLR), 2023 URL: https://openreview.net/forum?id=R0Xxvr_X3ZA
- Kenneth E. Train “Discrete Choice Methods with Simulation” Cambridge University Press, 2009 DOI: 10.1017/CBO9780511805271
- Thi Ngoc Trang Tran, Alexander Felfernig and Nava Tintarev “Humanized Recommender Systems: State-of-the-art and Research Issues” In ACM Transactions on Interactive Intelligent Systems 11.2, 2021, pp. 9:1–9:41 DOI: 10.1145/3446906
- “POLAR: Preference Optimization and Learning Algorithms for Robotics”, 2022 arXiv:2208.04404
- L.G. Valiant “A Theory of the Learnable” In Communications of the ACM 27.11, 1984, pp. 1134–1142 DOI: 10.1145/1968.1972
- “Scalar Reward Is Not Enough: A Response to Silver, Singh, Precup and Sutton (2021)” In Autonomous Agents and Multi-Agent Systems 36.2, 2022, pp. 41 DOI: 10.1007/s10458-022-09575-5
- “Attention Is All You Need” In Advances in Neural Information Processing Systems (NIPS) 30 Curran Associates, Inc., 2017, pp. 5998–6008 URL: https://papers.nips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
- Mudit Verma, Siddhant Bhambri and Subbarao Kambhampati “Exploiting Unlabeled Data for Feedback Efficient Human Preference Based Reinforcement Learning”, 2023
- “A State Augmentation Based Approach to Reinforcement Learning from Human Preferences”, 2023
- “Data Driven Reward Initialization for Preference Based Reinforcement Learning”, 2023
- “Symbol Guided Hindsight Priors for Reward Learning from Human Preferences”, 2022 arXiv:2210.09151
- “A Comprehensive Survey on Deep Active Learning and Its Applications in Medical Image Analysis”, 2023 arXiv:2310.14230
- Yuanhao Wang, Qinghua Liu and Chi Jin “Is RLHF More Difficult than Standard RL? A Theoretical Perspective”, 2023 URL: https://openreview.net/forum?id=sxZLrBqg50
- “Maximizing BCI Human Feedback Using Active Learning” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10945–10951 DOI: 10.1109/IROS45743.2020.9341669
- “Interactive Value Iteration for Markov Decision Processes with Unknown Rewards” In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI) AAAI Press, 2013, pp. 2415–2421 URL: https://www.ijcai.org/Proceedings/13/Papers/355.pdf
- “Do We Use the Right Measure? Challenges in Evaluating Reward Learning Algorithms” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2023, pp. 1553–1562 URL: https://proceedings.mlr.press/v205/wilde23a.html
- “Learning Reward Functions from Scale Feedback” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2022, pp. 353–362 URL: https://proceedings.mlr.press/v164/wilde22a.html
- Nils Wilde, Dana Kulić and Stephen L. Smith “Learning User Preferences in Robot Motion Planning Through Interaction” In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2018, pp. 619–626 DOI: 10.1109/ICRA.2018.8460586
- Nils Wilde, Dana Kulić and Stephen L. Smith “Active Preference Learning Using Maximum Regret” In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020, pp. 10952–10959 DOI: 10.1109/IROS45743.2020.9341530
- Aaron Wilson, Alan Fern and Prasad Tadepalli “A Bayesian Approach for Policy Learning from Trajectory Preference Queries” In Advances in Neural Information Processing Systems (NIPS) 25 Curran Associates, Inc., 2012, pp. 1142–1150 URL: https://proceedings.neurips.cc/paper/2012/hash/16c222aa19898e5058938167c8ab6c57-Abstract.html
- “A Survey of Preference-Based Reinforcement Learning Methods” In Journal of Machine Learning Research 18.136, 2017, pp. 1–46 URL: http://jmlr.org/papers/v18/16-634.html
- “A Policy Iteration Algorithm for Learning from Preference-Based Feedback” In Advances in Intelligent Data Analysis (IDA) Springer, 2013, pp. 427–437 DOI: 10.1007/978-3-642-41398-8_37
- “EPMC: Every Visit Preference Monte Carlo for Reinforcement Learning” In Proceedings of the Asian Conference on Machine Learning (ACML) PMLR, 2013, pp. 483–497 URL: https://proceedings.mlr.press/v29/Wirth13.html
- Christian Wirth, Johannes Fürnkranz and Gerhard Neumann “Model-Free Preference-Based Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence AAAI Press, 2016, pp. 2222–2228 URL: http://www.aaai.org/ocs/index.php/AAAI/AAAI16/paper/view/12247
- “Making RL with Preference-based Feedback Efficient via Randomization”, 2023 arXiv:2310.14554
- “A Survey of Human-in-the-Loop for Machine Learning” In Future Generation Computer Systems 135, 2022, pp. 364–381 DOI: 10.1016/j.future.2022.05.014
- “Dynamics-Aware Comparison of Learned Reward Functions” In Proceedings of International Conference on Learning Representations (ICLR), 2022 URL: https://openreview.net/forum?id=CALFyKVs87
- “FRESH: Interactive Reward Shaping in High-Dimensional State Spaces Using Human Feedback” In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS) IFAAMAS, 2020, pp. 1512–1520 DOI: 10.5555/3398761.3398935
- “Few-Shot Goal Inference for Visuomotor Learning and Planning” In Proceedings of the Conference on Robot Learning (CoRL) PMLR, 2018, pp. 40–52 URL: https://proceedings.mlr.press/v87/xie18a.html
- “ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation”, 2023 URL: https://openreview.net/forum?id=JVzeOYEx6d
- “Preference-Based Reinforcement Learning with Finite-Time Guarantees” In Advances in Neural Information Processing Systems (NeurIPS) 33 Curran Associates, Inc., 2020, pp. 18784–18794 URL: https://proceedings.neurips.cc/paper/2020/hash/d9d3837ee7981e8c064774da6cdd98bf-Abstract.html
- “Reinforcement Learning from Diverse Human Preferences”, 2023 arXiv:2301.11774
- “PrefRec: Recommender Systems with Human Preferences for Reinforcing Long-term User Engagement” In Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD) Association for Computing Machinery, 2023, pp. 2874–2884 DOI: 10.1145/3580305.3599473
- Georgios N. Yannakakis and Héctor P. Martínez “Ratings Are Overrated!” In Frontiers in ICT 2, 2015, pp. 13 DOI: 10.3389/fict.2015.00013
- “Interactively Optimizing Information Retrieval Systems as a Dueling Bandits Problem” In Proceedings of the International Conference on Machine Learning (ICML) Association for Computing Machinery, 2009, pp. 1201–1208 DOI: 10.1145/1553374.1553527
- Huixin Zhan, Feng Tao and Yongcan Cao “Human-Guided Robot Behavior Learning: A GAN-Assisted Preference-Based Reinforcement Learning Approach” In IEEE Robotics and Automation Letters 6.2, 2021, pp. 3545–3552 DOI: 10.1109/LRA.2021.3063927
- “Provable Offline Reinforcement Learning with Human Feedback”, 2023 URL: https://openreview.net/forum?id=fffH7DRz9X
- “How to Query Human Feedback Efficiently in RL?”, 2023 URL: https://openreview.net/forum?id=kW6siW4EB6
- “Time-Efficient Reward Learning via Visually Assisted Cluster Ranking”, 2022 arXiv:2212.00169
- “Learning State Importance for Preference-Based Reinforcement Learning” In Machine Learning, 2023 DOI: 10.1007/s10994-022-06295-5
- “Recent Advances in Leveraging Human Guidance for Sequential Decision-Making Tasks” In Autonomous Agents and Multi-Agent Systems 35.2, 2021, pp. 31 DOI: 10.1007/s10458-021-09514-w
- “The Wisdom of Hindsight Makes Language Models Better Instruction Followers” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 41414–41428 URL: https://proceedings.mlr.press/v202/zhang23ab.html
- “SLiC-HF: Sequence Likelihood Calibration with Human Feedback”, 2023 arXiv:2305.10425
- “Learning to Compare for Better Training and Evaluation of Open Domain Natural Language Generation Models” In Proceedings of the AAAI Conference on Artificial Intelligence 34.05, 2020, pp. 9717–9724 DOI: 10.1609/aaai.v34i05.6521
- Banghua Zhu, Michael Jordan and Jiantao Jiao “Principled Reinforcement Learning with Human Feedback from Pairwise or K-wise Comparisons” In Proceedings of the International Conference on Machine Learning (ICML) PMLR, 2023, pp. 43037–43067 URL: https://proceedings.mlr.press/v202/zhu23f.html
- “Maximum Entropy Inverse Reinforcement Learning” In Proceedings of the AAAI Conference on Artificial Intelligence AAAI Press, 2008, pp. 1433–1438 URL: https://cdn.aaai.org/AAAI/2008/AAAI08-227.pdf
- “Fine-Tuning Language Models from Human Preferences”, 2020 arXiv:1909.08593