-
LDC2020T07
Cross-lingual Abstract Meaning Representation (AMR) parsing dataset -
MasakhaNER 2.0
MasakhaNER 2.0 is a NER dataset in the news domain, including the annotations on 20 African languages. -
PaDaS-Lab/legal-reference-annotations
The dataset of privacy policies annotated using GDPR-compliant named entities. -
Legal-Entity-Recognition
The dataset of German legal reference annotations. -
Chinese NER using lattice LSTM
Chinese NER using lattice LSTM -
Financial news corpus for company name recognition
Financial news corpus, company names dictionary, 35wSents dataset, Albert65kError dataset, development and test datasets -
PANX and UDPOS datasets
The PANX and UDPOS datasets are used for Named Entity Recognition and Part-of-Speech Tagging tasks among the CJKV languages. -
Facebook Product Name Identification Dataset
The dataset of posts from Facebook used for product name identification. -
SciFoodNER
A dataset of 88,526 ingredient phrases, created using Stratified Entity Frequency Sampling. -
Polyglot-ner
Polyglot-ner is a multilingual NER dataset. -
WikiGoldSK
WikiGoldSK is a manually annotated Slovak NER dataset. -
2010 i2b2/VA challenge dataset
The 2010 i2b2/VA challenge dataset consists of clinical summaries from three different medical sites: Partners Healthcare, Beth Israel Deaconess Medical Center, and the... -
TAC2017 Adverse Drug Reaction Extraction Task Testing Dataset
The testing dataset used for the adverse drug reaction extraction task in TAC2017. -
TAC2017 Adverse Drug Reaction Extraction Task Training Dataset
The training dataset used for the adverse drug reaction extraction task in TAC2017. -
TAC2017 Adverse Drug Reaction Extraction Task
The dataset used for the adverse drug reaction extraction task in TAC2017.