-
CUB-200-2011 Dataset
CUB-200-2011 is a fine-grained image dataset containing 11,788 images of birds across 200 species, used for few-shot learning and fine-grained classification. -
Yahoo Reviews Dataset
Yahoo dataset is used for building models that require textual review data, specifically for user-generated reviews. -
Stanford Natural Language Inference (SNLI)
The SNLI (Stanford Natural Language Inference) dataset is used for evaluating language understanding tasks and is comprised of sentence pairs annotated for their entailment... -
WMT English-German dataset
The WMT English-German dataset is used for evaluating translation models, focused on machine translation tasks. -
Filtered OpenSubtitles (fOST)
Filtered OpenSubtitles dataset contains high coherence context-response pairs extracted from the main OpenSubtitles corpus, aimed at ensuring better qualities in conversational... -
OpenSubtitles
The OpenSubtitles corpus is used for training and evaluating the conversational response generation models, providing context-response pairs from dialogue turn segments. -
Stochastic Sequential MNIST (ssMNIST)
The Stochastic Sequential MNIST (ssMNIST) dataset consists of higher-order sequences of randomly chosen MNIST digits that are drawn according to a predetermined list of labels,... -
Penn Treebank (PTB)
The Penn Treebank (PTB) dataset is used for language modeling tasks, specifically for next word prediction, where it serves to evaluate the trained models' performance in... -
ApolloScape Lane Segmentation Dataset
The ApolloScape dataset for lane segmentation contains more than 110,000 frames with high quality pixel-level annotations, including 35 kinds of lane and road markings from... -
Benchmark datasets for Chinese spell checking
This dataset contains erroneous and corrected sentences for Chinese spell checking, divided into multiple benchmark datasets harvested from past shared tasks and additional OCR... -
AutoToon dataset
The AutoToon dataset is a paired dataset of human facial portrait photos and their corresponding geometrically warped cartoons created by trained artists, used to train the... -
IMDb Movie Review Dataset
The IMDb movie review dataset consists of a balanced sample of 25,000 positive and 25,000 negative reviews, divided into equal-size train and test sets, with an average document... -
French Street Name Signs (FSNS)
The French Street Name Signs (FSNS) dataset consists of over 1 million images of French street name signs extracted from Google Street View, posing challenges such as irregular,... -
First Quora Dataset Release - Question Pairs
The dataset consists of 404,290 question pairs from Quora, used to identify semantically duplicate questions. -
GuessWhat?! dataset
The GuessWhat?! dataset consists of sentences asked by humans during a cooperative game, containing a broader vocabulary. -
Toy dataset of sentences from CFG
The toy dataset consists of sentences generated from a context-free grammar (CFG) where sentences are framed as questions about objects. -
HELEN Dataset
The HELEN dataset consists of face photos with labeled facial components, utilized as the source domain for training the domain adaptation model for caricature face parsing. -
Amazon Product Reviews Dataset
The Amazon product reviews dataset contains unlabeled reviews used to augment the LAPTOP dataset for aspect-term sentiment analysis. -
Kaggle Restaurant Reviews Dataset
The Kaggle sentiment analysis competition dataset contains unlabeled restaurant reviews used to supplement the labeled SemEval dataset for improved performance in sentiment...