-
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
A general language assistant as a laboratory for alignment
A general language assistant for aligning language models with human users -
Realtoxicityprompts: Evaluating neural toxic degeneration in language models
A dataset for evaluating neural toxic degeneration in language models -
Alignment of language agents
A dataset for aligning language agents -
SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection
Open-vocabulary object detection (OvOD) has transformed detection into a language-guided task, empowering users to freely define their class vocabularies of interest during... -
WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese...
WanJuan: A comprehensive multimodal dataset for advancing English and Chinese large models. -
Wikipedia dataset
The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries... -
Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture
Multilingual Blending: LLM Safety Alignment Evaluation with Language Mixture -
ChatGPT model data
ChatGPT model data, used to generate text -
Adding A Filter Based on The Discriminator to Improve Unconditional Text Gene...
The dataset is used for unconditional text generation, and the authors propose a novel mechanism to improve the generator by adding a filter which has the same input as the... -
Textual Sports Commentary Dataset
The textual dataset is a collection of live sports commentaries scraped from various sources, including live score websites and YouTube. -
Sports Commentary Dataset
The dataset is a collection of live sports commentaries, including audio and textual data, used to train and evaluate machine learning models for event recognition and... -
CLEVR-Robot Environment
A benchmark for evaluating task compositionality and long-horizon tasks through object manipulation, with language serving as the mechanism for goal specification. -
PersonaChat dataset
The PersonaChat dataset is a large persona-conditioned chit-chat style dialogue dataset. -
OPT-66B and Llama2-70B
The dataset used in the paper is OPT-66B, a large language model, and Llama2-70B, another large language model.