-
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,... -
Gutenberg Corpus
A dataset of 2,857 books written by 141 authors, used for pre-training and fine-tuning a language model for author-stylized text generation. -
Language models are few-shot learners
A language model that demonstrates capabilities in processing and generating human-like text.