-
OpenWebText Corpus
A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words. -
One Billion Words Dataset
A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words.