Jailbreak Prompts and Malicious Queries

The dataset comprises 448 in-the-wild jailbreak prompts and 161 malicious queries, with which the authors derived a systemization of five categories and ten unique jailbreak patterns.

BibTex: