24,167 datasets found

Organizations: No Organization Formats: JSON

Filter Results
  • SemEval 2014 Task 4 dataset

    The SemEval 2014 task 4 dataset contains labeled sentences and sentence-aspect pairs for aspect-term sentiment analysis, focusing on specific domains such as restaurants and...
  • Hands 2017 challenge dataset

    The Hands 2017 challenge dataset contains depth images used for training and testing 3D hand pose estimation methods, with a focus on various hand shapes and poses.
  • WMT 2014 English-to-French Dataset

    The WMT 2014 English-to-French dataset contains 36 million sentence pairs that are used to benchmark translation models.
  • WMT 2014 English-to-German Dataset

    The WMT 2014 English-to-German dataset consists of 4.5 million sentence pairs used for neural machine translation.
  • UGWC

    UGWC (User-Generated Web Content) dataset includes various types of labeled data for sentence segmentation tasks, collected from user conversations in the financial domain, with...
  • Orchid

    The Orchid dataset is a Thai part-of-speech-tagged dataset containing 10,864 sentences hierarchically separated into paragraphs, sentences, and words, with manual POS tagging by...
  • General Language Understanding Evaluation (GLUE) benchmark

    GLUE is a multi-task benchmark that contains a diverse set of natural language understanding tasks including sentiment analysis, natural language inference, and textual...
  • IWSLT'14 German to English Translation Dataset

    IWSLT’14 (International Workshop on Spoken Language Translation) German to English dataset consists of parallel sentences for machine translation tasks, containing approximately...
  • University of Maryland Reddit Suicidality Dataset

    The University of Maryland Reddit Suicidality Dataset contains Reddit posts from the r/SuicideWatch subreddit, used to assess suicidality risk based on user postings.
  • SVHN

    The SVHN (Street View House Numbers) dataset consists of over 600,000 digit images that are cropped from street view images, used for benchmarking algorithms dealing with noisy...
  • CSMSC Dataset

    The CSMSC dataset is a corpus for Mandarin Chinese speech synthesis research.
  • JVS Corpus

    JVS corpus is a free Japanese multi-speaker voice corpus, used for various speech synthesis tasks.
  • Jacquard Dataset

    The Jacquard dataset is a large-scale dataset for robotic grasp detection, featuring dense grasp rectangle annotations.
  • Cornell Grasping Dataset

    The Cornell Grasping Dataset (CGD) contains manually-labeled grasp annotations for a limited number of examples, focusing on detecting robotic grasps.
  • WMT English-German Translation

    WMT English-German translation task is used for supervised conditional language generation, where the authors assess the model's performance in translating from English to German.
  • MTG-Jamendo Dataset

    The MTG-Jamendo dataset is used for automatically recognizing the emotions and themes in music recordings based on the raw audio, focusing on mood and theme tagging.
  • Cornell Movie Dialogues

    The Cornell Movie Dialogues dataset features two-character dialogues from movie scripts, capturing a large variety of human interaction in many different fictional circumstances.
  • MalwareTextDB

    The MalwareTextDB corpus consists of APT reports describing malware related information for text classification and token label prediction tasks.
  • Holl-E

    The Holl-E dataset consists of dialogues with a single document provided per conversation, including spans in documents that indicate parts used for generating responses.
  • CelebA-HQ 256x256

    The 256x256 CelebA-HQ dataset is utilized to train an Image Transformer for autoregressive image generation.