-
Measuring Massive Multitask Language Understanding
The dataset used in this paper is a multiple choice question set that allows for the evaluation of large language models. -
CommonsenseQA
The dataset used in the paper is also mentioned as CommonsenseQA, which is a 5-way multiple choice QA dataset that requires commonsense knowledge.