4 datasets found

Tags: language model

Filter Results
  • Wikipedia Corpus

    The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...
  • Gutenberg Corpus

    A dataset of 2,857 books written by 141 authors, used for pre-training and fine-tuning a language model for author-stylized text generation.
  • Language models are few-shot learners

    A language model that demonstrates capabilities in processing and generating human-like text.
  • C4

    The dataset used for pre-training language models, containing a large collection of text documents.