-
DEFENDER: DTW-Based Episode Filtering Using Demonstrations for Enhancing RL S...
The dataset used in this paper is a set of demonstrations for reinforcement learning, containing safe and unsafe trajectories. -
Universal and transferable adversarial attacks on aligned language models
AdvBench is a dataset for evaluating the safety of large language models. -
PKU-SafeRLHF dataset
The dataset used in the paper is the PKU-SafeRLHF dataset.