-
Grammaticality Judgment Task
The dataset used in the paper is a grammaticality judgment task featuring four linguistic phenomena: anaphora, center embedding, comparatives, and negative polarity constructions. -
Finetuned language models are zero-shot learners
Finetuned language models are zero-shot learners -
Self-Recognition in Language Models
A self-recognition test for language models using model-generated security questions. -
SafeDecoding dataset
The dataset used in the SafeDecoding paper, which contains 32 harmful queries spanning 16 harmful categories. -
Multilingual Language Models
The dataset used in this paper for multilingual language models -
Corpus Pairs Dataset
Corpus pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs. -
Minimal Pairs Dataset
Minimal pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs. -
Sentiment Training Dataset
Sentiment training dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs. -
Language Models with Image Descriptors
The Language Models with Image Descriptors dataset, which is used for evaluating the performance of the InstructVid2Vid model. -
Using Large Language Models to Simulate Multiple Humans
The dataset used in the paper to simulate human behavior in various experiments, including the Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of... -
ETHICS benchmark
The ETHICS benchmark is a dataset for evaluating the ethics of language models. -
MQUAKE: Assessing knowledge editing in language models via multi-hop questions
MQUAKE is a knowledge editing benchmark that includes MQUAKE-CF-3K based on counterfactual edits, and MQUAKE-T with temporal knowledge updates. -
HumanEval, MBPP, APPS
The dataset used in the paper is a code generation benchmark, consisting of 164 function declarations alongside their documentation, 500 test examples, each one is an... -
Comprehensive Assessment of Jailbreak Attacks against LLMs
The Comprehensive Assessment of Jailbreak Attacks against LLMs dataset is used to evaluate the effectiveness of jailbreak attacks on language models. -
Language models are few-shot learners
A language model that demonstrates capabilities in processing and generating human-like text. -
MQUAKE-CF and MQUAKE-T datasets
The MQUAKE-CF and MQUAKE-T datasets comprise multi-hop questions that are based on real-world facts, where the edited facts are counterfactual.