HERON: Hierarchical Preference-based Reinforcement Learning
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a hierarchical reward design framework to train policies in various environments.
BibTex: