-
Seungjeongwon Corpus
The Seungjeongwon corpus is a historical corpus that contains the diary of a royal secretary from the Joseon Dynasty, with annotated named entities and punctuation markers. -
Few-shot Name Entity Recognition on StackOverflow
Few-shot Name Entity Recognition on StackOverflow -
StackOverflow NER Corpus
StackOverflow NER corpus, which contains more than 1,237 question-answer threads from StackOverflow 10-year archive with 27 types of entities. -
Word Class Lattices (WCL)
Word Class Lattices (WCL) was introduced by Navigli and Velardi. It consists of 1,871 definitional and 2,847 non-definitional sentences from Wikipedia. -
Rare-NER, Bio-NER, and Twitter-POS datasets
The Rare-NER, Bio-NER, and Twitter-POS datasets are used for named entity recognition and part-of-speech tagging. -
CL-NERIL: A Cross-Lingual Model for NER in Indian Languages
CL-NERIL: A Cross-Lingual Model for NER in Indian Languages -
CoNLL-03, CoNLL-04, ACE-05, and CoNLL-12 datasets
The dataset used in the paper for named entity recognition, end-to-end relation extraction, and coreference resolution. -
Fast End-to-End Wikification
A run-time oriented tool for context-free Wikification based on Wikipedia redirects. -
From TagMe to WAT: a new entity annotator
From TagMe to WAT: a new entity annotator. -
What did you mention? A large scale mention detection benchmark for spoken an...
A large-scale mention detection benchmark for spoken and written text. -
Co-EM Dataset
A named entity recognition dataset -
i2b2 2010 dataset
The i2b2 2010 dataset is a corpus of clinical text that was created as part of an NLP challenge in 2010. -
i2b2 2012 dataset
The i2b2 2012 dataset is a corpus of clinical text that was created as part of an NLP challenge in 2012. -
Named Entity Recognition with Bidirectional LSTM-CNNs
Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high... -
Resume, CoNLL-2004, FewFC
Resume for NER; CoNLL-2004 for RE; FewFC for EE -
CoNLL-2003 English NER shared task dataset
The CoNLL-2003 English NER shared task dataset, consisting of 14,041/3,250/3,453 sentences in the training/development/test set respectively, all extracted from Reuters news... -
ACE 2005, WebNLG, CoNLL, NYT, and FB15k-237
The dataset used in the paper is ACE 2005, WebNLG, CoNLL, NYT, and FB15k-237. The ACE 2005 dataset is a collection of news articles, while WebNLG is a corpus used for natural... -
NEWS 2010 English-Hindi test set
The NEWS 2010 English-Hindi test set is used for transliteration equivalence evaluation. -
NEWS 2009 English-Hindi training set
The NEWS 2009 English-Hindi training set is used for transliteration equivalence learning.