-
Dysmenorrhea Dataset
The authors used their own dataset for dysmenorrhea classification. -
N2C2 Smoking Challenge
The authors used the N2C2 smoking challenge data-set for smoking status classification task. -
N2C2 Obesity Challenge
Clinical note classification is a common clinical NLP task. However, annotated data-sets are scarce. The authors used the N2C2 obesity challenge data-set, the N2C2 smoking... -
MasakhaNER 2.0
MasakhaNER 2.0 is a NER dataset in the news domain, including the annotations on 20 African languages. -
Sanskrit Text Annotation
The Sanskrit text is annotated with various NLP tasks, including sentence boundary detection, canonical word ordering, free-form text annotation of tokens, token classification,... -
Super-NaturalInstructions (SNI) dataset
The Super-NaturalInstructions (SNI) dataset is a collection of 1761 diverse NLP tasks belonging to one of 76 task types. -
Towards Improving Selective Prediction Ability of NLP Systems
SNLI, MNLI, Stress Test, Matched Mismatched, Competence, Distraction, and Noise datasets -
Universal Dependencies (UD) treebanks
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors used the Universal Dependencies (UD) treebanks. -
Data Management Operations and Recipes
A dataset management operations and recipes for NLP data production -
A Workflow Manager for Complex NLP and Content Curation Pipelines
A workflow manager for the flexible creation and customisation of NLP processing pipelines. -
MatSci-NLP
The MatSci-NLP dataset is a collection of materials science text for NLP tasks. -
Towards Dark Jargon Interpretation in Underground Forums
Dark jargons are benign-looking words that have hidden, sinister meanings and are used by participants of underground forums for illicit behavior. -
ACL Anthology
The ACL Anthology dataset contains papers on natural language processing, including citation patterns, authorship, and language use over time. -
Cross-lingual semantic representation for NLP with UCCA
The UCCA dataset is used to test the annotation scheme in cross-lingual semantic representation for NLP. -
Multilingual Misinformation & Its Evolution
The dataset used in this study is a combination of data from Google Fact-Check explorer and data directly crawled from the websites of verified signatories of the International... -
GLUE benchmark
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used three downstream tasks from the GLUE benchmark: Stanford Sentiment Treebank...