Dataset - LDM

One Billion Word

The One Billion Word dataset is a large dataset of text, containing 0.8 billion words belonging to a vocabulary of 793 471 words. The dataset is used for word-level language...
- Dataset
- JSON
Penn Tree Bank

The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

2 datasets found