Toy Example Dataset
The dataset used in the paper is a toy example, consisting of a 10x10 grid world, with the agent at position (0, 0). Obstacles are randomly positioned, at an obstacle to free position ratio of 0.2. The agent is presented a plan π (an action sequence) of 10 movements (up, down, left, right, with obvious semantics). The agent has a Bernoulli action failure probability pfail uniformly sampled from [0; 1]. Action failure results in the inverse movement (e.g. failing up yields down). The agent is presented a number of observations about its failure probability before running BV.
BibTex: