-
Anthropic Persona Dataset
The Persona dataset contains 99 different personas, each entailing 500 statements that align and 500 statements that disagree with the persona trait. -
LyricCanvas
The LyricCanvas dataset is a large-scale collection of lyrics with noisy visual descriptions that represent their implicit meaning. -
English and Luganda datasets for ASR-free keyword spotting
South African English and Luganda datasets -
Feature learning for efficient ASR-free keyword spotting in low-resource lang...
ASR-free keyword spotting in low-resource languages -
Tensor Trust Dataset
A dataset of prompt injection attacks for evaluating the effectiveness of Tensor Trust in detecting prompt injection attacks. -
SPML Dataset
A dataset of system prompts and user prompts for evaluating the effectiveness of SPML in detecting prompt injection attacks. -
Natural Questions
The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer. -
Examining the State-of-the-Art in News Timeline Summarization
Examining the state-of-the-art in news timeline summarization. -
Deep Compositional Robotic Planners
A dataset for training a compositional hierarchical recurrent network to follow natural language commands in continuous environments. -
MS MARCO: A Human-Generated Machine Reading Comprehension Dataset
The dataset is used for training and evaluating the MS MARCO model, a question answering model. -
Photorealistic text-to-image diffusion models with deep language understanding
The authors present a photorealistic text-to-image diffusion model with deep language understanding. -
Google Speech Commands Dataset
The Google Speech Commands Dataset contains 64,727 one-second-long utterance files which are recorded and labeled with one of 30 target categories. -
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Keyword spotting (KWS) plays a critical role in enabling speech-based user interactions on smart devices. Recent developments in the field of deep learning have led to wide... -
Wiki-40B, PG-19, C4, etc.
The dataset used in the paper is not explicitly described. However, it is mentioned that the authors used various benchmarks such as Wiki-40B, PG-19, C4, etc. -
RoentGen: Vision-Language Foundation Model for Chest X-ray Generation
Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in gener-ating high-quality images. Medical imaging data is fundamentally... -
Stanford Alpaca
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used CIFAR-10 and CIFAR-100 datasets for image classification, and ImageNet-100...