4 datasets found

Tags: text generation

Filter Results
  • C4

    The dataset used for pre-training language models, containing a large collection of text documents.
  • OpenWebText Corpus

    A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words.
  • One Billion Words Dataset

    A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words.
  • Wikitext-103

    The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.