Billion Word Benchmark Dataset

The dataset contains 768M tokens for language modeling.

BibTex: