-
Semantic Scholar Open Research Corpus
The Semantic Scholar Open Research Corpus contains meta-data of 46,947,044 published research papers in Computer Science, Neuroscience, and Bio-medicine from 1936 to 2019. -
ROC-Stories: A Corpus for Evaluating Story Generation Models
ROC-Stories: A Corpus for Evaluating Story Generation Models -
PropBank.Br
The PropBank.Br corpus is a corpus of Brazilian Portuguese texts annotated with semantic roles. -
Asian Scientific Paper Excerpt Corpus (ASPEC)
Asian Scientific Paper Excerpt Corpus (ASPEC) -
Penn Discourse Treebank 2.0
The Penn Discourse Treebank 2.0 (PDTB 2.0) is a large scale corpus containing 2,312 Wall Street Journal (WSJ) articles. -
Brown Corpus
The Brown corpus is an out-of-domain dataset. -
Switchboard Corpus
The Switchboard corpus is a dataset of speech recordings from a switchboard, which is a device that allows multiple people to speak at the same time. -
Penn Treebank
The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths. -
Librispeech
The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers.