No Organization - Organizations

SemEval 2014 Task 4 dataset

The SemEval 2014 task 4 dataset contains labeled sentences and sentence-aspect pairs for aspect-term sentiment analysis, focusing on specific domains such as restaurants and...

Dataset
JSON

Hands 2017 challenge dataset

The Hands 2017 challenge dataset contains depth images used for training and testing 3D hand pose estimation methods, with a focus on various hand shapes and poses.

Dataset
JSON

WMT 2014 English-to-French Dataset

The WMT 2014 English-to-French dataset contains 36 million sentence pairs that are used to benchmark translation models.

Dataset
JSON

WMT 2014 English-to-German Dataset

The WMT 2014 English-to-German dataset consists of 4.5 million sentence pairs used for neural machine translation.

Dataset
JSON

UGWC

UGWC (User-Generated Web Content) dataset includes various types of labeled data for sentence segmentation tasks, collected from user conversations in the financial domain, with...

Dataset
JSON

Orchid

The Orchid dataset is a Thai part-of-speech-tagged dataset containing 10,864 sentences hierarchically separated into paragraphs, sentences, and words, with manual POS tagging by...

Dataset
JSON

General Language Understanding Evaluation (GLUE) benchmark

GLUE is a multi-task benchmark that contains a diverse set of natural language understanding tasks including sentiment analysis, natural language inference, and textual...

Dataset
JSON

IWSLT'14 German to English Translation Dataset

IWSLT’14 (International Workshop on Spoken Language Translation) German to English dataset consists of parallel sentences for machine translation tasks, containing approximately...

Dataset
JSON

University of Maryland Reddit Suicidality Dataset

The University of Maryland Reddit Suicidality Dataset contains Reddit posts from the r/SuicideWatch subreddit, used to assess suicidality risk based on user postings.

Dataset
JSON

SVHN

The SVHN (Street View House Numbers) dataset consists of over 600,000 digit images that are cropped from street view images, used for benchmarking algorithms dealing with noisy...

Dataset
JSON

CSMSC Dataset

The CSMSC dataset is a corpus for Mandarin Chinese speech synthesis research.

Dataset
JSON

JVS Corpus

JVS corpus is a free Japanese multi-speaker voice corpus, used for various speech synthesis tasks.

Dataset
JSON

Jacquard Dataset

The Jacquard dataset is a large-scale dataset for robotic grasp detection, featuring dense grasp rectangle annotations.

Dataset
JSON

Cornell Grasping Dataset

The Cornell Grasping Dataset (CGD) contains manually-labeled grasp annotations for a limited number of examples, focusing on detecting robotic grasps.

Dataset
JSON

WMT English-German Translation

WMT English-German translation task is used for supervised conditional language generation, where the authors assess the model's performance in translating from English to German.

Dataset
JSON

MTG-Jamendo Dataset

The MTG-Jamendo dataset is used for automatically recognizing the emotions and themes in music recordings based on the raw audio, focusing on mood and theme tagging.

Dataset
JSON

Cornell Movie Dialogues

The Cornell Movie Dialogues dataset features two-character dialogues from movie scripts, capturing a large variety of human interaction in many different fictional circumstances.

Dataset
JSON

MalwareTextDB

The MalwareTextDB corpus consists of APT reports describing malware related information for text classification and token label prediction tasks.

Dataset
JSON

Holl-E

The Holl-E dataset consists of dialogues with a single document provided per conversation, including spans in documents that indicate parts used for generating responses.

Dataset
JSON

CelebA-HQ 256x256

The 256x256 CelebA-HQ dataset is utilized to train an Image Transformer for autoregressive image generation.

Dataset
JSON

24,167 datasets found