- The paper introduces CALM, which combines pre-trained language models with reinforcement learning to significantly reduce the action space in text-based games.
- It demonstrates a 69% relative improvement in game scores on the Jericho benchmark, outperforming state-of-the-art methods without relying on ground-truth admissible actions.
- The work underscores the value of integrating human gameplay transcripts for training models that generalize decision-making in complex, interactive environments.
Keep CALM and Explore: LLMs for Action Generation in Text-based Games
This paper introduces an innovative approach to addressing the challenges posed by text-based games, which require autonomous agents to navigate vast action spaces while processing natural language. The researchers propose the Contextual Action LLM (CALM), designed to generate a concise set of potential actions at each game state by training LLMs on human gameplay to acquire linguistic priors that reflect sensible actions based on game history.
CALM operates by generating candidate actions that are subsequently ranked by a reinforcement learning (RL) agent to optimize in-game rewards. The action space in text-based games is notably large, given the expansive vocabulary, making it computationally intensive to evaluate all possibilities. CALM effectively reduces this action space by focusing on actions that are contextually relevant, learned from human players who possess innate gameplay intuition.
The paper evaluates CALM using the Jericho benchmark, a suite of text-based games. The results show a substantial 69% relative improvement in average game scores over previous state-of-the-art models, a noteworthy achievement that demonstrates CALM's effectiveness. Remarkably, for half of the games tested, CALM performs as well as or better than models that utilize ground-truth admissible actions, indicating its superior capability to discern relevant from infeasible actions without external inputs.
The research hinges on leveraging pre-trained LLMs, specifically a variant of GPT-2, and further training them on a new dataset extracted from transcripts of human gameplay from 590 different games. This training regimen allows the model to develop a nuanced understanding of the actions that are likely to succeed in unseen games. Moreover, the authors demonstrate the strengths of combining CALM with the Deep Reinforcement Relevance Network (DRRN), highlighting the synergy between linguistic priors and adaptive action selection based on gameplay metrics.
In analyzing the results, the paper emphasizes several key aspects. It acknowledges the complexity of producing admissible actions among a plethora of combinations, where only a fraction significantly impacts game progression. CALM's robust performance without relying on admissible action handicaps suggests that LLMs can effectively capture and generalize human-like decision-making in complex environments. Additionally, the analysis addresses the balance between the exploration of novel actions and the exploitation of reliable ones, which is critical for optimizing game strategies.
The implications of this research are notable, suggesting that integrating linguistic models with RL strategies offers a pathway to more efficient and effective autonomous agents in sequential decision-making environments. The potential to train such models on diverse datasets implies broad applicability, extending beyond gaming to any domain where understanding and navigating extensive action spaces is required.
Looking forward, future developments could focus on enhancing the scalability of CALM and exploring its integration with other advanced RL techniques. Additionally, the exploration of alternative dataset sources, further model pre-training, and adaptation to different game genres could offer deeper insights into the robustness and flexibility of LLMs in action generation tasks. Overall, this paper contributes significantly to the discourse on AI strategies that blend language understanding with trained gameplay competence, aligning with broader trends in developing sophisticated AI capable of complex reasoning and decision-making in dynamic, interactive environments.