-
LunarLander environment
The dataset used in the paper is the LunarLander environment, which is a classic control problem. The agent learns to land a lunar lander using human feedback. -
CartPole, Pendulum, and LunarLander
The dataset used in the paper is a set of environments for reinforcement learning, including CartPole, Pendulum, and LunarLander.