Stick-Breaking Policy Learning in Dec-POMDPs (1505.00274v2)

Published 1 May 2015 in cs.AI, cs.SY, and stat.ML

Abstract: Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

Citations (29)

View on Semantic Scholar

Summary

We haven't generated a summary for this paper yet.

Summarize Now

Stick-Breaking Policy Learning in Dec-POMDPs (1505.00274v2)

Summary

Related Papers