Text Generation - Groups

Wikitext-2

The dataset used in this paper is not explicitly described. However, it is mentioned that the authors used the Wikitext-2 dataset for text generation tasks.

Dataset
JSON

C4

The dataset used for pre-training language models, containing a large collection of text documents.

Dataset
JSON

Text8

Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.

Dataset
JSON

Wikitext-103

The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.

Dataset
JSON

4 datasets found

Wikitext-2

C4

Text8

Wikitext-103