Measuring Massive Multitask Language Understanding

The dataset used in this paper is a multiple choice question set that allows for the evaluation of large language models.

BibTex: