Emergent Mind

Myopic Policy Bounds for Information Acquisition POMDPs

(1601.07279)
Published Jan 27, 2016 in cs.SY

Abstract

This paper addresses the problem of optimal control of robotic sensing systems aimed at autonomous information gathering in scenarios such as environmental monitoring, search and rescue, and surveillance and reconnaissance. The information gathering problem is formulated as a partially observable Markov decision process (POMDP) with a reward function that captures uncertainty reduction. Unlike the classical POMDP formulation, the resulting reward structure is nonlinear in the belief state and the traditional approaches do not apply directly. Instead of developing a new approximation algorithm, we show that if attention is restricted to a class of problems with certain structural properties, one can derive (often tight) upper and lower bounds on the optimal policy via an efficient myopic computation. These policy bounds can be applied in conjunction with an online branch-and-bound algorithm to accelerate the computation of the optimal policy. We obtain informative lower and upper policy bounds with low computational effort in a target tracking domain. The performance of branch-and-bounding is demonstrated and compared with exact value iteration.

We're not able to analyze this paper right now due to high demand.

Please check back later (sorry!).

Generate a summary of this paper on our Pro plan:

We ran into a problem analyzing this paper.

Newsletter

Get summaries of trending comp sci papers delivered straight to your inbox:

Unsubscribe anytime.