-
Penn Treebank (PTB)
The Penn Treebank (PTB) dataset is used for language modeling tasks, specifically for next word prediction, where it serves to evaluate the trained models' performance in... -
ApolloScape Lane Segmentation Dataset
The ApolloScape dataset for lane segmentation contains more than 110,000 frames with high quality pixel-level annotations, including 35 kinds of lane and road markings from... -
Benchmark datasets for Chinese spell checking
This dataset contains erroneous and corrected sentences for Chinese spell checking, divided into multiple benchmark datasets harvested from past shared tasks and additional OCR... -
AutoToon dataset
The AutoToon dataset is a paired dataset of human facial portrait photos and their corresponding geometrically warped cartoons created by trained artists, used to train the... -
IMDb Movie Review Dataset
The IMDb movie review dataset consists of a balanced sample of 25,000 positive and 25,000 negative reviews, divided into equal-size train and test sets, with an average document... -
French Street Name Signs (FSNS)
The French Street Name Signs (FSNS) dataset consists of over 1 million images of French street name signs extracted from Google Street View, posing challenges such as irregular,... -
First Quora Dataset Release - Question Pairs
The dataset consists of 404,290 question pairs from Quora, used to identify semantically duplicate questions. -
GuessWhat?! dataset
The GuessWhat?! dataset consists of sentences asked by humans during a cooperative game, containing a broader vocabulary. -
Toy dataset of sentences from CFG
The toy dataset consists of sentences generated from a context-free grammar (CFG) where sentences are framed as questions about objects. -
HELEN Dataset
The HELEN dataset consists of face photos with labeled facial components, utilized as the source domain for training the domain adaptation model for caricature face parsing. -
Amazon Product Reviews Dataset
The Amazon product reviews dataset contains unlabeled reviews used to augment the LAPTOP dataset for aspect-term sentiment analysis. -
Kaggle Restaurant Reviews Dataset
The Kaggle sentiment analysis competition dataset contains unlabeled restaurant reviews used to supplement the labeled SemEval dataset for improved performance in sentiment... -
SemEval 2014 Task 4 dataset
The SemEval 2014 task 4 dataset contains labeled sentences and sentence-aspect pairs for aspect-term sentiment analysis, focusing on specific domains such as restaurants and... -
Hands 2017 challenge dataset
The Hands 2017 challenge dataset contains depth images used for training and testing 3D hand pose estimation methods, with a focus on various hand shapes and poses. -
WMT 2014 English-to-French Dataset
The WMT 2014 English-to-French dataset contains 36 million sentence pairs that are used to benchmark translation models. -
WMT 2014 English-to-German Dataset
The WMT 2014 English-to-German dataset consists of 4.5 million sentence pairs used for neural machine translation. -
General Language Understanding Evaluation (GLUE) benchmark
GLUE is a multi-task benchmark that contains a diverse set of natural language understanding tasks including sentiment analysis, natural language inference, and textual... -
IWSLT'14 German to English Translation Dataset
IWSLT’14 (International Workshop on Spoken Language Translation) German to English dataset consists of parallel sentences for machine translation tasks, containing approximately...