-
University of Maryland Reddit Suicidality Dataset
The University of Maryland Reddit Suicidality Dataset contains Reddit posts from the r/SuicideWatch subreddit, used to assess suicidality risk based on user postings. -
CSMSC Dataset
The CSMSC dataset is a corpus for Mandarin Chinese speech synthesis research. -
JVS Corpus
JVS corpus is a free Japanese multi-speaker voice corpus, used for various speech synthesis tasks. -
Jacquard Dataset
The Jacquard dataset is a large-scale dataset for robotic grasp detection, featuring dense grasp rectangle annotations. -
Cornell Grasping Dataset
The Cornell Grasping Dataset (CGD) contains manually-labeled grasp annotations for a limited number of examples, focusing on detecting robotic grasps. -
WMT English-German Translation
WMT English-German translation task is used for supervised conditional language generation, where the authors assess the model's performance in translating from English to German. -
MTG-Jamendo Dataset
The MTG-Jamendo dataset is used for automatically recognizing the emotions and themes in music recordings based on the raw audio, focusing on mood and theme tagging. -
Cornell Movie Dialogues
The Cornell Movie Dialogues dataset features two-character dialogues from movie scripts, capturing a large variety of human interaction in many different fictional circumstances. -
MalwareTextDB
The MalwareTextDB corpus consists of APT reports describing malware related information for text classification and token label prediction tasks. -
CelebA-HQ 256x256
The 256x256 CelebA-HQ dataset is utilized to train an Image Transformer for autoregressive image generation. -
ImageNet 64x64
The 64x64 ImageNet dataset is used for training a vector-quantized variational auto-encoder, encoding images into a tensor of latents. -
HatEval dataset
The HatEval dataset provides annotated tweets to evaluate hate speech detection, specifically concerning immigrants and women in a multilingual context. -
Synthesized Dataset of Stylized and Real Face Pairs
A large-scale synthesized dataset of stylized face (SF) and ground-truth real face (RF) pairs is generated to train the Identity-preserving Face Recovery from Portraits (IFRP)...