-
BookCorpus
The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text. -
Reuters RCV1-v2
The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly... -
Penn Treebank dataset
The dataset used in the paper is the Penn Treebank dataset, which is a large-scale text classification dataset. -
MNIST-SVHN-Text dataset
The MNIST-SVHN-Text dataset is a multi-modal dataset consisting of images, text, and labels. -
Training Language Models to Perform Tasks
A dataset for training language models to perform tasks such as question answering and text classification. -
E2E dataset
The E2E dataset consists of 50K restaurant reviews together with the labels in terms of food type, price, and customer ratings. -
Elsevier OA CC-BY corpus
The Elsevier OA CC-BY corpus dataset consists of 40,000 open-access articles from across Elsevier's journals, representing a diverse research discipline.