5 datasets found

Tags: text data

Filter Results
  • Wikipedia articles

    Wikipedia articles dataset is a dataset of image-text pairs designed for cross-modal retrieval applications.
  • Wikicorpus

    The dataset used in the experiments to evaluate the adaptation of language models to nonstandard text.
  • Shifts Machine Translation dataset

    The Shifts Machine Translation dataset consists of pairs of source and target sentences in English and Russian.
  • Twitter Dataset

    The Twitter Dataset is a collection of tweets annotated with Plutchik's emotions, consisting of tweets in three different languages: English, Dutch, and German.
  • CommonCrawl

    CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes.