Text Generation - Groups

Wikipedia Corpus

The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...

Dataset
JSON

Gutenberg Corpus

A dataset of 2,857 books written by 141 authors, used for pre-training and fine-tuning a language model for author-stylized text generation.

Dataset
JSON

Language models are few-shot learners

A language model that demonstrates capabilities in processing and generating human-like text.

Dataset
JSON

C4

The dataset used for pre-training language models, containing a large collection of text documents.

Dataset
JSON

4 datasets found

Wikipedia Corpus

Gutenberg Corpus

Language models are few-shot learners

C4