-
AANN construction dataset
The AANN construction dataset -
CoLA corpus and AANN construction dataset
The CoLA corpus of acceptability judgments and the AANN construction dataset -
ETHICS benchmark
The ETHICS benchmark is a dataset for evaluating the ethics of language models. -
HumanEval, MBPP, APPS
The dataset used in the paper is a code generation benchmark, consisting of 164 function declarations alongside their documentation, 500 test examples, each one is an... -
Comprehensive Assessment of Jailbreak Attacks against LLMs
The Comprehensive Assessment of Jailbreak Attacks against LLMs dataset is used to evaluate the effectiveness of jailbreak attacks on language models. -
Self-Supervised Alignment with Mutual Information
The dataset is used for training a language model to follow behavioral principles without the use of preference labels, demonstrations, or human oversight. -
GPT-2 small
The dataset used in this paper is a large language model, GPT-2 small, and its residual stream activations. -
BERT: Pre-training of deep bidirectional transformers for language understanding
This paper proposes BERT, a pre-trained deep bidirectional transformer for language understanding. -
GPT-4 Dataset
The GPT-4 dataset used for fine-tuning the Qwen model. -
Demonstration ITerated Task Optimization (DITTO)
The dataset used in the paper is a collection of email and blog posts from 20 distinct authors, with a focus on few-shot alignment of large language models. -
Towards the Scalable Evaluation of Cooperativeness in Language Models
The dataset is used to evaluate the cooperative tendencies of language models. It consists of scenarios with particular game-theoretic structures, generated through both... -
SHP dataset
The SHP dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs). -
HH-RLHF dataset
The HH-RLHF dataset is used to evaluate the performance of the proposed Compositional Preference Models (CPMs). -
Training Language Models to Perform Tasks
A dataset for training language models to perform tasks such as question answering and text classification. -
Interpreting Learned Feedback Patterns in Large Language Models
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a condensed representation of LLM activations obtained from sparse...