No Organization - Organizations

Anthropic Persona Dataset

The Persona dataset contains 99 different personas, each entailing 500 statements that align and 500 statements that disagree with the persona trait.

Dataset
JSON

LyricCanvas

The LyricCanvas dataset is a large-scale collection of lyrics with noisy visual descriptions that represent their implicit meaning.

Dataset
JSON

SQuAD

The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...

Dataset
JSON

English and Luganda datasets for ASR-free keyword spotting

South African English and Luganda datasets

Dataset
JSON

Feature learning for efficient ASR-free keyword spotting in low-resource lang...

ASR-free keyword spotting in low-resource languages

Dataset
JSON

Tensor Trust Dataset

A dataset of prompt injection attacks for evaluating the effectiveness of Tensor Trust in detecting prompt injection attacks.

Dataset
JSON

SPML Dataset

A dataset of system prompts and user prompts for evaluating the effectiveness of SPML in detecting prompt injection attacks.

Dataset
JSON

Natural Questions

The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.

Dataset
JSON

TriviaQA

The TriviaQA dataset is a collection of questions sourced from Quiz League websites, with sentence-level supporting facts annotation.

Dataset
JSON

Examining the State-of-the-Art in News Timeline Summarization

Examining the state-of-the-art in news timeline summarization.

Dataset
JSON

SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...

Dataset
JSON

Deep Compositional Robotic Planners

A dataset for training a compositional hierarchical recurrent network to follow natural language commands in continuous environments.

Dataset
JSON

MS MARCO: A Human-Generated Machine Reading Comprehension Dataset

The dataset is used for training and evaluating the MS MARCO model, a question answering model.

Dataset
JSON

Photorealistic text-to-image diffusion models with deep language understanding

The authors present a photorealistic text-to-image diffusion model with deep language understanding.

Dataset
JSON

Google Speech Commands Dataset

The Google Speech Commands Dataset contains 64,727 one-second-long utterance files which are recorded and labeled with one of 30 target categories.

Dataset
JSON

Temporal Convolution for Real-time Keyword Spotting on Mobile Devices

Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide...

Dataset
JSON

Wiki-40B, PG-19, C4, etc.

The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used various benchmarks such as Wiki-40B, PG-19, C4, etc.

Dataset
JSON

RoentGen: Vision-Language Foundation Model for Chest X-ray Generation

Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in gener-ating high-quality images. Medical imaging data is fundamentally...

Dataset
JSON

CXR-LLAVA

A multimodal large language model for interpreting chest X-ray images

Dataset
JSON

Stanford Alpaca

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100...

Dataset
JSON

420 datasets found