-
CartPole environment
The dataset used in the paper is the CartPole environment, which is a classic control problem. The agent learns to balance a pole using human feedback. -
CartPole, Pendulum, and LunarLander
The dataset used in the paper is a set of environments for reinforcement learning, including CartPole, Pendulum, and LunarLander.