20,499 datasets found

Filter Results
  • University of Maryland Reddit Suicidality Dataset

    The University of Maryland Reddit Suicidality Dataset contains Reddit posts from the r/SuicideWatch subreddit, used to assess suicidality risk based on user postings.
  • SVHN

    The SVHN (Street View House Numbers) dataset consists of over 600,000 digit images that are cropped from street view images, used for benchmarking algorithms dealing with noisy...
  • CSMSC Dataset

    The CSMSC dataset is a corpus for Mandarin Chinese speech synthesis research.
  • JVS Corpus

    JVS corpus is a free Japanese multi-speaker voice corpus, used for various speech synthesis tasks.
  • Jacquard Dataset

    The Jacquard dataset is a large-scale dataset for robotic grasp detection, featuring dense grasp rectangle annotations.
  • Cornell Grasping Dataset

    The Cornell Grasping Dataset (CGD) contains manually-labeled grasp annotations for a limited number of examples, focusing on detecting robotic grasps.
  • WMT English-German Translation

    WMT English-German translation task is used for supervised conditional language generation, where the authors assess the model's performance in translating from English to German.
  • MTG-Jamendo Dataset

    The MTG-Jamendo dataset is used for automatically recognizing the emotions and themes in music recordings based on the raw audio, focusing on mood and theme tagging.
  • Cornell Movie Dialogues

    The Cornell Movie Dialogues dataset features two-character dialogues from movie scripts, capturing a large variety of human interaction in many different fictional circumstances.
  • MalwareTextDB

    The MalwareTextDB corpus consists of APT reports describing malware related information for text classification and token label prediction tasks.
  • Holl-E

    The Holl-E dataset consists of dialogues with a single document provided per conversation, including spans in documents that indicate parts used for generating responses.
  • CelebA-HQ 256x256

    The 256x256 CelebA-HQ dataset is utilized to train an Image Transformer for autoregressive image generation.
  • ImageNet 64x64

    The 64x64 ImageNet dataset is used for training a vector-quantized variational auto-encoder, encoding images into a tensor of latents.
  • Pen

    The Pen dataset consists of pen-based user interfaces for anomaly detection involving user input patterns.
  • Optical

    The Optical dataset is a collection of optical character recognition data used for detecting anomalies in text recognition.
  • Satellite

    The Satellite dataset includes satellite imagery data used for anomaly detection tasks, identifying unique patterns in the images.
  • Letter

    The Letter dataset contains handwritten letters for anomaly detection tasks, where outliers represent specific letter patterns.
  • HatEval dataset

    The HatEval dataset provides annotated tweets to evaluate hate speech detection, specifically concerning immigrants and women in a multilingual context.
  • affNIST

    The affNIST dataset is created by applying various affine transformations to the MNIST digits, making it suitable for testing algorithms designed to handle geometric distortions.
  • Synthesized Dataset of Stylized and Real Face Pairs

    A large-scale synthesized dataset of stylized face (SF) and ground-truth real face (RF) pairs is generated to train the Identity-preserving Face Recovery from Portraits (IFRP)...