Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 1 dataset found Groups: Language Modeling Filter Results Penn Treebank and Wikipedia-90M The Penn Treebank dataset is used for sentence-level language modeling, and the 90 million word subset of Wikipedia is used for paraphrasing. Dataset JSON