Text Generation - Groups

WebText

The dataset used in this paper is the WebText dataset, which is a widely used dataset for natural language processing tasks.

Dataset
JSON

Wikipedia Corpus

The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...

Dataset
JSON

Wikitext-2

The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used the Wikitext-2 dataset for text generation tasks.

Dataset
JSON

Text8

Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.

Dataset
JSON

Training Transformers to Perform Tasks

A dataset for training transformers to perform tasks such as language translation and text generation.

Dataset
JSON

5 datasets found

WebText

Wikipedia Corpus

Wikitext-2

Text8

Training Transformers to Perform Tasks