One Billion Word

The One Billion Word dataset is a large dataset of text, containing 0.8 billion words belonging to a vocabulary of 793 471 words. The dataset is used for word-level language modeling.

Data and Resources

Cite this as

Alexandre de Brébisson, Pascal Vincent (2024). Dataset: One Billion Word. https://doi.org/10.57702/u1dsk8f1

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1604.08859
Author Alexandre de Brébisson
More Authors
Pascal Vincent
Homepage https://www.clsp.cs.cmu.edu/CLSP/datasets.html