-
HOLISTICBIAS
A large dataset for measuring bias in language models, including nearly 600 descriptor terms across 13 different demographic axes. -
MNLI, QQP, and SST-2
The dataset used in this paper consists of three tasks: Multi-Genre Natural Language Inference (MNLI), Quora Question Pairs (QQP), and Stanford Sentiment Treebank (SST-2). -
Are Larger Pretrained Language Models Uniformly Better? Comparing Performance...
Larger language models have higher accu- racy on average, but are they better on ev- ery single instance (datapoint)? -
Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence A...
The authors used a variety of datasets for question answering, including TriviaQA, Natural Questions, CountryQA, and Jeopardy questions. -
BIG-Bench Hard
The BIG-Bench Hard dataset is derived from the original BIG-Bench evaluation suite, focusing on tasks that pose challenges to existing language models. -
Dense Reward for Free in RLHF
The dataset used in the paper is not explicitly described, but it is mentioned that it is a preference dataset for language models. -
Automata-based constraints for language model decoding
The dataset used in this paper is a collection of regular expressions and grammars for constraining language models. -
Language Models of Spoken Dutch
The dataset consists of subtitles of television shows provided by the Flemish public-service broadcaster VRT. The dataset is used to train language models of spoken Dutch. -
Grammaticality Judgment Task
The dataset used in the paper is a grammaticality judgment task featuring four linguistic phenomena: anaphora, center embedding, comparatives, and negative polarity constructions. -
Finetuned language models are zero-shot learners
Finetuned language models are zero-shot learners -
Self-Recognition in Language Models
A self-recognition test for language models using model-generated security questions. -
Multilingual Language Models
The dataset used in this paper for multilingual language models -
Corpus Pairs Dataset
Corpus pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs. -
Minimal Pairs Dataset
Minimal pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs. -
Sentiment Training Dataset
Sentiment training dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs. -
Using Large Language Models to Simulate Multiple Humans
The dataset used in the paper to simulate human behavior in various experiments, including the Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of... -
Language models are few-shot learners
A language model that demonstrates capabilities in processing and generating human-like text. -
Demonstration ITerated Task Optimization (DITTO)
The dataset used in the paper is a collection of email and blog posts from 20 distinct authors, with a focus on few-shot alignment of large language models.