24,167 datasets found

Organizations: No Organization Formats: JSON

Filter Results
  • PAWS-X

    PAWS-X is a cross-lingual adversarial dataset for paraphrase identification consisting of 23,659 human translated pairs in six languages (French, Spanish, German, Chinese,...
  • CUB-200-2011 Dataset

    CUB-200-2011 is a fine-grained image dataset containing 11,788 images of birds across 200 species, used for few-shot learning and fine-grained classification.
  • Yahoo Reviews Dataset

    Yahoo dataset is used for building models that require textual review data, specifically for user-generated reviews.
  • Stanford Natural Language Inference (SNLI)

    The SNLI (Stanford Natural Language Inference) dataset is used for evaluating language understanding tasks and is comprised of sentence pairs annotated for their entailment...
  • WMT English-German dataset

    The WMT English-German dataset is used for evaluating translation models, focused on machine translation tasks.
  • Filtered OpenSubtitles (fOST)

    Filtered OpenSubtitles dataset contains high coherence context-response pairs extracted from the main OpenSubtitles corpus, aimed at ensuring better qualities in conversational...
  • OpenSubtitles

    The OpenSubtitles corpus is used for training and evaluating the conversational response generation models, providing context-response pairs from dialogue turn segments.
  • Stochastic Sequential MNIST (ssMNIST)

    The Stochastic Sequential MNIST (ssMNIST) dataset consists of higher-order sequences of randomly chosen MNIST digits that are drawn according to a predetermined list of labels,...
  • Penn Treebank (PTB)

    The Penn Treebank (PTB) dataset is used for language modeling tasks, specifically for next word prediction, where it serves to evaluate the trained models' performance in...
  • ApolloScape Lane Segmentation Dataset

    The ApolloScape dataset for lane segmentation contains more than 110,000 frames with high quality pixel-level annotations, including 35 kinds of lane and road markings from...
  • Benchmark datasets for Chinese spell checking

    This dataset contains erroneous and corrected sentences for Chinese spell checking, divided into multiple benchmark datasets harvested from past shared tasks and additional OCR...
  • AutoToon dataset

    The AutoToon dataset is a paired dataset of human facial portrait photos and their corresponding geometrically warped cartoons created by trained artists, used to train the...
  • IMDb Movie Review Dataset

    The IMDb movie review dataset consists of a balanced sample of 25,000 positive and 25,000 negative reviews, divided into equal-size train and test sets, with an average document...
  • French Street Name Signs (FSNS)

    The French Street Name Signs (FSNS) dataset consists of over 1 million images of French street name signs extracted from Google Street View, posing challenges such as irregular,...
  • First Quora Dataset Release - Question Pairs

    The dataset consists of 404,290 question pairs from Quora, used to identify semantically duplicate questions.
  • GuessWhat?! dataset

    The GuessWhat?! dataset consists of sentences asked by humans during a cooperative game, containing a broader vocabulary.
  • Toy dataset of sentences from CFG

    The toy dataset consists of sentences generated from a context-free grammar (CFG) where sentences are framed as questions about objects.
  • HELEN Dataset

    The HELEN dataset consists of face photos with labeled facial components, utilized as the source domain for training the domain adaptation model for caricature face parsing.
  • Amazon Product Reviews Dataset

    The Amazon product reviews dataset contains unlabeled reviews used to augment the LAPTOP dataset for aspect-term sentiment analysis.
  • Kaggle Restaurant Reviews Dataset

    The Kaggle sentiment analysis competition dataset contains unlabeled restaurant reviews used to supplement the labeled SemEval dataset for improved performance in sentiment...