-
OpenWebText Corpus
A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words. -
One Billion Words Dataset
A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words. -
Wikitext-103
The dataset used in this paper is Wikitext-103, a general English language corpus containing good and featured Wikipedia articles.