20,499 datasets found

Filter Results
  • Penn Treebank (PTB)

    The Penn Treebank (PTB) dataset is used for language modeling tasks, specifically for next word prediction, where it serves to evaluate the trained models' performance in...
  • ApolloScape Lane Segmentation Dataset

    The ApolloScape dataset for lane segmentation contains more than 110,000 frames with high quality pixel-level annotations, including 35 kinds of lane and road markings from...
  • Benchmark datasets for Chinese spell checking

    This dataset contains erroneous and corrected sentences for Chinese spell checking, divided into multiple benchmark datasets harvested from past shared tasks and additional OCR...
  • AutoToon dataset

    The AutoToon dataset is a paired dataset of human facial portrait photos and their corresponding geometrically warped cartoons created by trained artists, used to train the...
  • IMDb Movie Review Dataset

    The IMDb movie review dataset consists of a balanced sample of 25,000 positive and 25,000 negative reviews, divided into equal-size train and test sets, with an average document...
  • French Street Name Signs (FSNS)

    The French Street Name Signs (FSNS) dataset consists of over 1 million images of French street name signs extracted from Google Street View, posing challenges such as irregular,...
  • First Quora Dataset Release - Question Pairs

    The dataset consists of 404,290 question pairs from Quora, used to identify semantically duplicate questions.
  • GuessWhat?! dataset

    The GuessWhat?! dataset consists of sentences asked by humans during a cooperative game, containing a broader vocabulary.
  • Toy dataset of sentences from CFG

    The toy dataset consists of sentences generated from a context-free grammar (CFG) where sentences are framed as questions about objects.
  • HELEN Dataset

    The HELEN dataset consists of face photos with labeled facial components, utilized as the source domain for training the domain adaptation model for caricature face parsing.
  • Amazon Product Reviews Dataset

    The Amazon product reviews dataset contains unlabeled reviews used to augment the LAPTOP dataset for aspect-term sentiment analysis.
  • Kaggle Restaurant Reviews Dataset

    The Kaggle sentiment analysis competition dataset contains unlabeled restaurant reviews used to supplement the labeled SemEval dataset for improved performance in sentiment...
  • SemEval 2014 Task 4 dataset

    The SemEval 2014 task 4 dataset contains labeled sentences and sentence-aspect pairs for aspect-term sentiment analysis, focusing on specific domains such as restaurants and...
  • Hands 2017 challenge dataset

    The Hands 2017 challenge dataset contains depth images used for training and testing 3D hand pose estimation methods, with a focus on various hand shapes and poses.
  • WMT 2014 English-to-French Dataset

    The WMT 2014 English-to-French dataset contains 36 million sentence pairs that are used to benchmark translation models.
  • WMT 2014 English-to-German Dataset

    The WMT 2014 English-to-German dataset consists of 4.5 million sentence pairs used for neural machine translation.
  • UGWC

    UGWC (User-Generated Web Content) dataset includes various types of labeled data for sentence segmentation tasks, collected from user conversations in the financial domain, with...
  • Orchid

    The Orchid dataset is a Thai part-of-speech-tagged dataset containing 10,864 sentences hierarchically separated into paragraphs, sentences, and words, with manual POS tagging by...
  • General Language Understanding Evaluation (GLUE) benchmark

    GLUE is a multi-task benchmark that contains a diverse set of natural language understanding tasks including sentiment analysis, natural language inference, and textual...
  • IWSLT'14 German to English Translation Dataset

    IWSLT’14 (International Workshop on Spoken Language Translation) German to English dataset consists of parallel sentences for machine translation tasks, containing approximately...