Billion Word Benchmark Dataset

Followers: 0

Organization

No Organization

There is no description for this organization

License

No License Provided

Export

DCAT(rdf/xml) DCAT(xml) DCAT(N3) DCAT(ttl) DCAT(jsonld) DataCite CSL DublinCore BibTex

Billion Word Benchmark Dataset

The dataset contains 768M tokens for language modeling.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Hassan et al. (2024). Dataset: Billion Word Benchmark Dataset. https://doi.org/10.57702/bprj7ycm

DOI retrieved: December 3, 2024

Additional Info

Field	Value
Created	December 3, 2024
Last update	December 3, 2024
Author	Hassan et al.

Before browse our site, please accept our cookies policy