Billion Word Benchmark Dataset

Organization

There is no description for this organization

No License Provided

You're currently viewing an old version of this dataset. To see the current version, click here.

The dataset contains 768M tokens for language modeling.

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Hassan et al. (2024). Dataset: Billion Word Benchmark Dataset. https://doi.org/10.57702/bprj7ycm

DOI retrieved: December 3, 2024

Before browse our site, please accept our cookies policy