-
Wall Street Journal corpus
The Wall Street Journal corpus (wsj), WikiText-103 (wiki), and dev split of Librispeech (lib-dev) are used. -
BookCorpus
The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.