-
Massive Multitask Language Understanding (MMLU) dataset
The MMLU dataset is a benchmark for measuring the behavior of large language models on a number of tasks. It consists of 15908 multiple choice questions distributed across 57... -
Anchored Answers
Anchored Answers: Unravelling Positional Bias in GPT-2’s Multiple-Choice Questions