No Organization - Organizations

PAWS-X

PAWS-X is a cross-lingual adversarial dataset for paraphrase identification consisting of 23,659 human translated pairs in six languages (French, Spanish, German, Chinese,...

Dataset
JSON

CUB-200-2011 Dataset

CUB-200-2011 is a fine-grained image dataset containing 11,788 images of birds across 200 species, used for few-shot learning and fine-grained classification.

Dataset
JSON

Yahoo Reviews Dataset

Yahoo dataset is used for building models that require textual review data, specifically for user-generated reviews.

Dataset
JSON

Stanford Natural Language Inference (SNLI)

The SNLI (Stanford Natural Language Inference) dataset is used for evaluating language understanding tasks and is comprised of sentence pairs annotated for their entailment...

Dataset
JSON

WMT English-German dataset

The WMT English-German dataset is used for evaluating translation models, focused on machine translation tasks.

Dataset
JSON

Filtered OpenSubtitles (fOST)

Filtered OpenSubtitles dataset contains high coherence context-response pairs extracted from the main OpenSubtitles corpus, aimed at ensuring better qualities in conversational...

Dataset
JSON

OpenSubtitles

The OpenSubtitles corpus is used for training and evaluating the conversational response generation models, providing context-response pairs from dialogue turn segments.

Dataset
JSON

Stochastic Sequential MNIST (ssMNIST)

The Stochastic Sequential MNIST (ssMNIST) dataset consists of higher-order sequences of randomly chosen MNIST digits that are drawn according to a predetermined list of labels,...

Dataset
JSON

Penn Treebank (PTB)

The Penn Treebank (PTB) dataset is used for language modeling tasks, specifically for next word prediction, where it serves to evaluate the trained models' performance in...

Dataset
JSON

ApolloScape Lane Segmentation Dataset

The ApolloScape dataset for lane segmentation contains more than 110,000 frames with high quality pixel-level annotations, including 35 kinds of lane and road markings from...

Dataset
JSON

Benchmark datasets for Chinese spell checking

This dataset contains erroneous and corrected sentences for Chinese spell checking, divided into multiple benchmark datasets harvested from past shared tasks and additional OCR...

Dataset
JSON

AutoToon dataset

The AutoToon dataset is a paired dataset of human facial portrait photos and their corresponding geometrically warped cartoons created by trained artists, used to train the...

Dataset
JSON

IMDb Movie Review Dataset

The IMDb movie review dataset consists of a balanced sample of 25,000 positive and 25,000 negative reviews, divided into equal-size train and test sets, with an average document...

Dataset
JSON

French Street Name Signs (FSNS)

The French Street Name Signs (FSNS) dataset consists of over 1 million images of French street name signs extracted from Google Street View, posing challenges such as irregular,...

Dataset
JSON

First Quora Dataset Release - Question Pairs

The dataset consists of 404,290 question pairs from Quora, used to identify semantically duplicate questions.

Dataset
JSON

GuessWhat?! dataset

The GuessWhat?! dataset consists of sentences asked by humans during a cooperative game, containing a broader vocabulary.

Dataset
JSON

Toy dataset of sentences from CFG

The toy dataset consists of sentences generated from a context-free grammar (CFG) where sentences are framed as questions about objects.

Dataset
JSON

HELEN Dataset

The HELEN dataset consists of face photos with labeled facial components, utilized as the source domain for training the domain adaptation model for caricature face parsing.

Dataset
JSON

Amazon Product Reviews Dataset

The Amazon product reviews dataset contains unlabeled reviews used to augment the LAPTOP dataset for aspect-term sentiment analysis.

Dataset
JSON

Kaggle Restaurant Reviews Dataset

The Kaggle sentiment analysis competition dataset contains unlabeled restaurant reviews used to supplement the labeled SemEval dataset for improved performance in sentiment...

Dataset
JSON

24,167 datasets found