-
Chinese OntoNotes v5.0
This dataset is used for Named Entity Recognition (NER) tasks. -
LaptopReview dataset
The LaptopReview dataset contains 3,012 mentions to laptop features. -
CoNLL 2003 dataset
The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks. -
LaptopReview
The LaptopReview dataset refers to Sub-task 1 for laptop aspect term (e.g., disk drive) recognition. It consists of 3,845 review sentences, which contains 3,012 AspectTerm... -
NCBI-Disease
The NCBI-Disease dataset consists of 793 PubMed abstracts, which has been separated into training set (593), development set (100), and test set (100). The dataset contains... -
Seungjeongwon Corpus
The Seungjeongwon corpus is a historical corpus that contains the diary of a royal secretary from the Joseon Dynasty, with annotated named entities and punctuation markers. -
Few-shot Name Entity Recognition on StackOverflow
Few-shot Name Entity Recognition on StackOverflow -
StackOverflow NER Corpus
StackOverflow NER corpus, which contains more than 1,237 question-answer threads from StackOverflow 10-year archive with 27 types of entities. -
Word Class Lattices (WCL)
Word Class Lattices (WCL) was introduced by Navigli and Velardi. It consists of 1,871 definitional and 2,847 non-definitional sentences from Wikipedia. -
CL-NERIL: A Cross-Lingual Model for NER in Indian Languages
CL-NERIL: A Cross-Lingual Model for NER in Indian Languages -
CoNLL-03, CoNLL-04, ACE-05, and CoNLL-12 datasets
The dataset used in the paper for named entity recognition, end-to-end relation extraction, and coreference resolution. -
Penn Tree Bank
The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The... -
Named Entity Recognition with Bidirectional LSTM-CNNs
Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high... -
CoNLL-2003 English NER shared task dataset
The CoNLL-2003 English NER shared task dataset, consisting of 14,041/3,250/3,453 sentences in the training/development/test set respectively, all extracted from Reuters news... -
NEWS 2010 English-Hindi test set
The NEWS 2010 English-Hindi test set is used for transliteration equivalence evaluation. -
NEWS 2009 English-Hindi training set
The NEWS 2009 English-Hindi training set is used for transliteration equivalence learning. -
CoNLL-2012
The CoNLL-2012 shared task dataset is a dataset for coreference resolution tasks.