harmless/harmful anchor datasets

This dataset contains 100 harmless and 100 harmful anchor prompts for evaluating the performance of large language models.

BibTex: