2 datasets found

Tags: Word-Level Language Modeling

Filter Results
  • One Billion Word

    The One Billion Word dataset is a large dataset of text, containing 0.8 billion words belonging to a vocabulary of 793 471 words. The dataset is used for word-level language...
  • Penn Tree Bank

    The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...
You can also access this registry using the API (see API Docs).