-
DDI corpus of the 2013 DDIExtraction challenge
The DDI corpus of the 2013 DDIExtraction challenge contains thousands of XML files, each of which are constructed by several records. The dataset is used to train and test a... -
Hub5e-swb Dataset
The Hub5e-swb dataset is a dataset of speech recordings from a hub5e-swb device, which is a device that allows multiple people to speak at the same time. -
Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish T...
The Copenhagen Corpus of eye tracking recordings from natural reading of Danish texts -
CAP: Corpus of Adjective Pairs
The CAP dataset is a corpus of adjective pairs used to evaluate adjective order preferences in language models. -
Music Corpus
The dataset used for term clustering to build a modular ontology according to core ontology from domain-specific text. -
Corpus of Spoken Dutch
The Corpus of Spoken Dutch (CGN) is a dataset of spoken Dutch recordings. -
Video Corpus
A corpus of free and representative video content was gathered. This corpus includes videos having progressive scanning, 1280x720 resolution, and framerates between 24-30 frames... -
WSJ corpus
The WSJ corpus contains 81.48 hours of speech from 283 adults. -
UKWaC and Wackypedia corpora
The dataset used in this paper is a large text corpus compiled from UKWaC and Wackypedia corpora. -
Switchboard Corpus
The Switchboard corpus is a dataset of speech recordings from a switchboard, which is a device that allows multiple people to speak at the same time. -
MuST-C: a Multilingual Speech Translation Corpus
MuST-C is a multilingual speech translation corpus. -
Leela’s corpus
The dataset contains word order frequencies from Leela’s corpus, which are used as a proxy for cognitive cost. -
Switchboard
Human speech data comprises a rich set of domain factors such as accent, syntactic and semantic variety, or acoustic environment. -
Librispeech
The Librispeech dataset is a large-scale speaker-dependent speech corpus containing 1080 hours of speech, 5600 utterances, and 1000 speakers. -
Leibniz University Hannover
Imported
NLPContributionGraph Trial Dataset
An Annotation Scheme for Machine Reading of Scholarly Contributions in Natural Language Processing Literature This dataset is the result of a pilot annotation exercise to...