MasakhaNER 2.0
MasakhaNER 2.0 is a NER dataset in the news domain, including the annotations on 20 African languages. -
The dataset of privacy policies annotated using GDPR-compliant named entities. -
Chinese NER using lattice LSTM
Chinese NER using lattice LSTM -
Financial news corpus for company name recognition
Financial news corpus, company names dictionary, 35wSents dataset, Albert65kError dataset, development and test datasets -
PANX and UDPOS datasets
The PANX and UDPOS datasets are used for Named Entity Recognition and Part-of-Speech Tagging tasks among the CJKV languages. -
Facebook Product Name Identification Dataset
The dataset of posts from Facebook used for product name identification. -
A dataset of 88,526 ingredient phrases, created using Stratified Entity Frequency Sampling. -
Polyglot-ner is a multilingual NER dataset. -
WikiGoldSK is a manually annotated Slovak NER dataset. -
TAC2017 Adverse Drug Reaction Extraction Task Testing Dataset
The testing dataset used for the adverse drug reaction extraction task in TAC2017. -
TAC2017 Adverse Drug Reaction Extraction Task Training Dataset
The training dataset used for the adverse drug reaction extraction task in TAC2017. -
TAC2017 Adverse Drug Reaction Extraction Task
The dataset used for the adverse drug reaction extraction task in TAC2017. -
CONLL 2002
The dataset used for evaluation of the proposed model. -
I2B2 2009 Medical Information Extraction Challenge
Named Entity Recognition in Electronic Health Records using Transfer Learning Bootstrapped Neural Networks -
ClubFloyd dataset
The ClubFloyd dataset is a collection of human transcripts of text-based games, used to train action candidate generators.