OpenWebText Corpus

A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words.

BibTex: