Can large language models explore in-context? (2403.15371v3)
Abstract: We investigate the extent to which contemporary LLMs can engage in exploration, a core capability in reinforcement learning and decision making. We focus on native performance of existing LLMs, without training interventions. We deploy LLMs as agents in simple multi-armed bandit environments, specifying the environment description and interaction history entirely in-context, i.e., within the LLM prompt. We experiment with GPT-3.5, GPT-4, and Llama2, using a variety of prompt designs, and find that the models do not robustly engage in exploration without substantial interventions: i) Across all of our experiments, only one configuration resulted in satisfactory exploratory behavior: GPT-4 with chain-of-thought reasoning and an externally summarized interaction history, presented as sufficient statistics; ii) All other configurations did not result in robust exploratory behavior, including those with chain-of-thought reasoning but unsummarized history. Although these findings can be interpreted positively, they suggest that external summarization -- which may not be possible in more complex settings -- is important for obtaining desirable behavior from LLM agents. We conclude that non-trivial algorithmic interventions, such as fine-tuning or dataset curation, may be required to empower LLM-based decision making agents in complex settings.
- A mechanism for sample-efficient in-context learning for sparse retrieval tasks. arXiv:2305.17040, 2023.
- Analysis of Thompson Sampling for the multi-armed bandit problem. In Conference on Learning Theory, 2012.
- Near-optimal regret bounds for thompson sampling. Journal of the ACM, 2017. Preliminary version in AISTATS 2013.
- Transformers learn to implement preconditioned gradient descent for in-context learning. arXiv:2306.00297, 2023.
- Do as I can, not as I say: Grounding language in robotic affordances. arXiv:2204.01691, 2022.
- In-context learning through the bayesian prism. arXiv:2306.04891, 2023.
- What learning algorithm is in-context learning? Investigations with linear models. arXiv:2211.15661, 2022.
- In-context language learning: Architectures and algorithms. arXiv:2401.12973, 2024.
- Finite-time analysis of the multiarmed bandit problem. Machine Learning, 2002.
- Transformers as statisticians: Provable in-context learning with in-context algorithm selection. arXiv:2306.04637, 2023.
- Bandit social learning: Exploration under myopic behavior. arXiv:2302.07425, 2023.
- Understanding in-context learning in transformers and LLMs by learning to learn discrete functions. arXiv:2310.03016, 2023.
- Large language models can implement policy iteration. In Advances in Neural Information Processing Systems, 2023.
- Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
- Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. Foundations and Trends in Machine Learning, 5(1):1–122, 2012. Published with Now Publishers (Boston, MA, USA). Also available at https://arxiv.org/abs/1204.5721.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv:2303.12712, 2023.
- Transformers implement functional gradient descent to learn non-linear functions in context. arXiv:2312.06528, 2023.
- Training verifiers to solve math word problems. arXiv:2110.14168, 2021.
- The case for 4-bit precision: k-bit inference scaling laws. In International Conference on Machine Learning, 2023.
- Synergpt: In-context learning for personalized drug synergy prediction and drug design. arXiv:2307.11694, 2023.
- Transformers learn higher-order optimization methods for in-context learning: A study with linear models. arXiv:2310.17086, 2023.
- Pal: Program-aided language models. In International Conference on Machine Learning, 2023.
- What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 2022.
- How do transformers learn in-context beyond simple functions? A case study on learning with representations. arXiv:2310.10616, 2023.
- A theory of emergent in-context learning as implicit structure induction. arXiv:2303.07971, 2023.
- Explaining emergent in-context learning as kernel regression. arXiv:2305.12766, 2023a.
- Understanding in-context learning via supportive pretraining data. arXiv:2306.15091, 2023b.
- In-context learning creates task vectors. arXiv:2310.15916, 2023.
- Adaptive contract design for crowdsourcing markets: Bandit algorithms for repeated principal-agent problems. Journal of Artificial Intelligence Research, 2016. Preliminary version in ACM EC 2014.
- In-context convergence of transformers. arXiv:2310.05249, 2023.
- An information-theoretic analysis of in-context learning. arXiv:2401.15530, 2024.
- Thompson sampling: An asymptotically optimal finite-time analysis. In International Conference on Algorithmic Learning Theory, 2012.
- Causal reasoning and large language models: Opening a new frontier for causality. arXiv:2305.00050, 2023.
- General-purpose in-context learning by meta-learning transformers. arXiv:2212.04458, 2022.
- Large language models are zero-shot reasoners. Advances in neural information processing systems, 2022.
- In-context reinforcement learning with algorithm distillation. arXiv:2210.14215, 2022.
- Bandit Algorithms. Cambridge University Press, 2020.
- Supervised pretraining can learn in-context reinforcement learning. arXiv:2306.14892, 2023a.
- The AI revolution in medicine: GPT-4 and beyond. Pearson, 2023b.
- Transformers as algorithms: Generalization and stability in in-context learning. In International Conference on Machine Learning, 2023.
- Transformers as decision makers: Provable in-context reinforcement learning via supervised pretraining. arXiv:2310.08566, 2023.
- Exposing attention glitches with flip-flop language modeling. Advances in Neural Information Processing Systems, 2024.
- Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts. arXiv:2310.02255, 2023.
- Eran Malach. Auto-regressive next-token predictors are universal learners. arXiv:2309.06979, 2023.
- Evaluating cognitive maps and planning in large language models with cogeval. arXiv:2309.15129, 2023.
- OpenAI. Gpt-4 technical report. arXiv:2303.08774, 2023.
- Generative agents: Interactive simulacra of human behavior. In Symposium on User Interface Software and Technology, 2023.
- Generalization to new sequential decision making tasks with in-context learning. arXiv:2312.03801, 2023.
- Pretraining task diversity and the emergence of non-bayesian in-context learning for regression. arXiv:2306.15063, 2023.
- A tutorial on thompson sampling. Foundations and Trends in Machine Learning, 2018.
- Quantifying language models’ sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. arXiv:2310.11324, 2023.
- Do pretrained transformers really learn in-context by gradient descent? arXiv:2310.08540, 2023.
- Reflexion: Language agents with verbal reinforcement learning. arXiv:2303.11366, 2023.
- Bayesian decision-making under misspecified priors with applications to meta-learning. Advances in Neural Information Processing Systems, 2021.
- Aleksandrs Slivkins. Introduction to multi-armed bandits. Foundations and Trends in Machine Learning, 2019.
- Ranked bandits in metric spaces: Learning optimally diverse rankings over large document collections. Journal of Machine Learning Research, 2013. Preliminary version in ICML, 2010.
- Demonstrations are all you need: Advancing offensive content paraphrasing using in-context learning. arXiv:2310.10707, 2023.
- William R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 1933.
- Llama 2: Open foundation and fine-tuned chat models. arXiv:2307.09288, 2023.
- Planbench: An extensible benchmark for evaluating large language models on planning and reasoning about change. In Advances in Neural Information Processing Systems: Datasets and Benchmarks Track, 2023.
- Transformers learn in-context by gradient descent. In International Conference on Machine Learning, 2023.
- Voyager: An open-ended embodied agent with large language models. arXiv:2305.16291, 2023.
- The ICL consistency test. arXiv:2312.04945, 2023.
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 2022.
- The learnability of in-context learning. arXiv:2303.07895, 2023.
- How many pretraining tasks are needed for in-context learning of linear regression? arXiv:2310.08391, 2023.
- Smartplay: A benchmark for LLMs as intelligent agents. In International Conference on Learning Representations, 2024.
- An explanation of in-context learning as implicit bayesian inference. arXiv:2111.02080, 2021.
- Prompting decision transformer for few-shot policy generalization. In International Conference on Machine Learning, 2022.
- Creative robot tool use with large language models. arXiv:2310.13065, 2023.
- Imitation versus innovation: What children can do that large language and language-and-vision models cannot (yet)? arXiv:2305.07666, 2023.
- Skill-mix: A flexible and expandable family of evaluations for ai models. arXiv:2310.17567, 2023.
- Trained transformers learn linear models in-context. arXiv:2306.09927, 2023a.
- What and how does in-context learning learn? bayesian model averaging, parameterization, and generalization. arXiv:2305.19420, 2023b.
- Akshay Krishnamurthy (92 papers)
- Keegan Harris (17 papers)
- Dylan J. Foster (66 papers)
- Cyril Zhang (34 papers)
- Aleksandrs Slivkins (67 papers)