The dataset used in the paper is the Minigrid environment, which is a 3D grid world with a goal at the bottom-right corner. The agent learns to navigate to the goal using human feedback.
BibTex:
Before browse our site, please accept our cookies policy