Google Billion Word dataset

doi:doi:10.57702/z0316hr7

Google Billion Word dataset

Followers: 0

Organization

No Organization

There is no description for this organization

License

No License Provided

Export

DCAT(rdf/xml) DCAT(xml) DCAT(N3) DCAT(ttl) DCAT(jsonld) DataCite CSL DublinCore BibTex

Google Billion Word dataset

The Google Billion Word dataset is one of the largest language modeling datasets with almost one billion tokens and a vocabulary of over 800K words, based on an English corpus of 30,301,028 shuffled sentences.

BibTex:

Before browse our site, please accept our cookies policy