-
MasakhaNER 2.0
MasakhaNER 2.0 is a NER dataset in the news domain, including the annotations on 20 African languages. -
PaDaS-Lab/legal-reference-annotations
The dataset of privacy policies annotated using GDPR-compliant named entities. -
Chinese named entity recognition method based on BERT
Chinese named entity recognition method based on BERT -
Flat Chinese NER using flat-lattice transformer
Flat Chinese NER using flat-lattice transformer -
Chinese NER using lattice LSTM
Chinese NER using lattice LSTM -
Financial news corpus for company name recognition
Financial news corpus, company names dictionary, 35wSents dataset, Albert65kError dataset, development and test datasets -
SBL-51abbr
The SBL-51abbr dataset consists of 51 randomly selected entries from SBL3. -
PANX and UDPOS datasets
The PANX and UDPOS datasets are used for Named Entity Recognition and Part-of-Speech Tagging tasks among the CJKV languages. -
Arabic Names Transiterated in Hebrew
The dataset used for training the Arabic names transliteration model, containing 2,000 Arabic names transliterated in Hebrew. -
Arabic Names Transiterated in English
The dataset used for training the Arabic names transliteration model, containing 3,600 Arabic names transliterated in English. -
Hebrew Names Transliterated in English
The dataset used for training the language identification model, containing 16,500 Hebrew names transliterated in English, 3,600 Arabic names transliterated in English, and... -
English to Hebrew Transliteration
The dataset used for transliterating person names from English to Hebrew, supporting both backward transliteration of Hebrew names and Sideways Transliteration of Arabic names. -
Facebook Product Name Identification Dataset
The dataset of posts from Facebook used for product name identification. -
SciFoodNER
A dataset of 88,526 ingredient phrases, created using Stratified Entity Frequency Sampling. -
Polyglot-ner
Polyglot-ner is a multilingual NER dataset. -
WikiGoldSK
WikiGoldSK is a manually annotated Slovak NER dataset. -
TAC2017 Adverse Drug Reaction Extraction Task Testing Dataset
The testing dataset used for the adverse drug reaction extraction task in TAC2017. -
TAC2017 Adverse Drug Reaction Extraction Task Training Dataset
The training dataset used for the adverse drug reaction extraction task in TAC2017.