The MMLU dataset is a benchmark for measuring the behavior of large language models on a number of tasks. It consists of 15908 multiple choice questions distributed across 57...
The Sciq dataset is a multi-domain multiple-choice question dataset consisting of 13,000 questions in the fields of physics, chemistry, biology, and other natural sciences.