Billion Word Benchmark Dataset

The dataset contains 768M tokens for language modeling.

Data and Resources

Cite this as

Hassan et al. (2024). Dataset: Billion Word Benchmark Dataset. https://doi.org/10.57702/bprj7ycm

DOI retrieved: December 3, 2024

Additional Info

Field Value
Created December 3, 2024
Last update December 3, 2024
Author Hassan et al.