-
Penn Tree Bank
The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The... -
Polyglot Wikipedia
The dataset used for training and testing the MVLSA model.