No Organization - Organizations

Penn Treebank (PTB)

The Penn Treebank (PTB) dataset is used for language modeling tasks, specifically for next word prediction, where it serves to evaluate the trained models' performance in...

Dataset
JSON

ApolloScape Lane Segmentation Dataset

The ApolloScape dataset for lane segmentation contains more than 110,000 frames with high quality pixel-level annotations, including 35 kinds of lane and road markings from...

Dataset
JSON

Benchmark datasets for Chinese spell checking

This dataset contains erroneous and corrected sentences for Chinese spell checking, divided into multiple benchmark datasets harvested from past shared tasks and additional OCR...

Dataset
JSON

AutoToon dataset

The AutoToon dataset is a paired dataset of human facial portrait photos and their corresponding geometrically warped cartoons created by trained artists, used to train the...

Dataset
JSON

IMDb Movie Review Dataset

The IMDb movie review dataset consists of a balanced sample of 25,000 positive and 25,000 negative reviews, divided into equal-size train and test sets, with an average document...

Dataset
JSON

French Street Name Signs (FSNS)

The French Street Name Signs (FSNS) dataset consists of over 1 million images of French street name signs extracted from Google Street View, posing challenges such as irregular,...

Dataset
JSON

First Quora Dataset Release - Question Pairs

The dataset consists of 404,290 question pairs from Quora, used to identify semantically duplicate questions.

Dataset
JSON

GuessWhat?! dataset

The GuessWhat?! dataset consists of sentences asked by humans during a cooperative game, containing a broader vocabulary.

Dataset
JSON

Toy dataset of sentences from CFG

The toy dataset consists of sentences generated from a context-free grammar (CFG) where sentences are framed as questions about objects.

Dataset
JSON

HELEN Dataset

The HELEN dataset consists of face photos with labeled facial components, utilized as the source domain for training the domain adaptation model for caricature face parsing.

Dataset
JSON

Amazon Product Reviews Dataset

The Amazon product reviews dataset contains unlabeled reviews used to augment the LAPTOP dataset for aspect-term sentiment analysis.

Dataset
JSON

Kaggle Restaurant Reviews Dataset

The Kaggle sentiment analysis competition dataset contains unlabeled restaurant reviews used to supplement the labeled SemEval dataset for improved performance in sentiment...

Dataset
JSON

SemEval 2014 Task 4 dataset

The SemEval 2014 task 4 dataset contains labeled sentences and sentence-aspect pairs for aspect-term sentiment analysis, focusing on specific domains such as restaurants and...

Dataset
JSON

Hands 2017 challenge dataset

The Hands 2017 challenge dataset contains depth images used for training and testing 3D hand pose estimation methods, with a focus on various hand shapes and poses.

Dataset
JSON

WMT 2014 English-to-French Dataset

The WMT 2014 English-to-French dataset contains 36 million sentence pairs that are used to benchmark translation models.

Dataset
JSON

WMT 2014 English-to-German Dataset

The WMT 2014 English-to-German dataset consists of 4.5 million sentence pairs used for neural machine translation.

Dataset
JSON

UGWC

UGWC (User-Generated Web Content) dataset includes various types of labeled data for sentence segmentation tasks, collected from user conversations in the financial domain, with...

Dataset
JSON

Orchid

The Orchid dataset is a Thai part-of-speech-tagged dataset containing 10,864 sentences hierarchically separated into paragraphs, sentences, and words, with manual POS tagging by...

Dataset
JSON

General Language Understanding Evaluation (GLUE) benchmark

GLUE is a multi-task benchmark that contains a diverse set of natural language understanding tasks including sentiment analysis, natural language inference, and textual...

Dataset
JSON

IWSLT'14 German to English Translation Dataset

IWSLT’14 (International Workshop on Spoken Language Translation) German to English dataset consists of parallel sentences for machine translation tasks, containing approximately...

Dataset
JSON

20,499 datasets found