-
One Billion Word
The One Billion Word dataset is a large dataset of text, containing 0.8 billion words belonging to a vocabulary of 793 471 words. The dataset is used for word-level language... -
Penn Tree Bank
The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...