-
Measuring Massive Multitask Language Understanding
The dataset used in this paper is a multiple choice question set that allows for the evaluation of large language models. -
Content Moderation Dataset (CMD)
A dataset of social media content containing potentially biased (unsafe) texts, along with unbiased (safe or benign) variations.