Text Corpus - Groups - LDM

Wall Street Journal corpus

The Wall Street Journal corpus (wsj), WikiText-103 (wiki), and dev split of Librispeech (lib-dev) are used.
- Dataset
- JSON
Text8

Word2Vec is a distributed word embedding generator that uses an artificial neural network to learn dense vector representations of words.
- Dataset
- JSON
BookCorpus

The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.
- Dataset
- JSON

Before browse our site, please accept our cookies policy