-
Language Agency Bias Evaluation (LABE)
Language Agency Bias Evaluation (LABE) framework to systematically and comprehensively measure gender, racial, and intersectional biases in language agency across a wide scope... -
Towards a unified multi-dimensional evaluator for text generation
The NewsRoom dataset consists of 60 input source texts and 7 output summaries for each sample. -
Of human criteria and automatic metrics: A benchmark of the evaluation of sto...
The HANNA dataset contains 1056 creative story writings generated from 96 prompts collected from WritingPrompt. -
A general theoretical paradigm to understand learning from human preferences
The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF. -
Towards Answering Climate Questionnaires
Two new large-scale climate questionnaire datasets, CLIMA-CDP and CLIMA-INS, are introduced. The datasets are composed of semi-structured questionnaires from different... -
DALL-E3 and Stable Diffusion Dataset
A dataset used by the authors to test their hypothesis about the white bear phenomenon in large models. -
White Bear Phenomenon Dataset
A dataset generated by the authors to test their hypothesis about the white bear phenomenon in large models. -
Llama: Open and efficient foundation language models
The LLaMA dataset is a large language model dataset used in the paper. -
Proof-Pile-2
The dataset used for continual pre-training of large language models, with a focus on balancing the text distribution and mitigating overfitting. -
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Tra...
The dataset used in this paper is ImageNet and SQuAD and GLUE datasets. -
SNOiC: Soft Labeling and Noisy Mixup based Open Intent Classification Model
This paper presents a Soft Labeling and Noisy Mixup-based open intent classification model (SNOiC). Most of the previous works have used threshold-based methods to identify open... -
Using Large Language Models to Simulate Multiple Humans
The dataset used in the paper to simulate human behavior in various experiments, including the Ultimatum Game, Garden Path Sentences, Milgram Shock Experiment, and Wisdom of... -
Self-StrAE at SemEval-2024 Task 1: Making Self-Structuring AutoEncoders Learn...
Self-StrAE is a model that processes a given sentence to generate both multi-level embeddings and a structure over the input. -
IWSLT-14 DE-EN
The dataset used in this paper is a machine translation dataset, specifically IWSLT-14 DE-EN. -
Learning Language-Conditioned Robot Behavior from Offline Data and Crowd-Sour...
Learning language-conditioned robot behavior from offline data and crowd-sourced annotation.