Named Entity Recognition - Groups

Seungjeongwon Corpus

The Seungjeongwon corpus is a historical corpus that contains the diary of a royal secretary from the Joseon Dynasty, with annotated named entities and punctuation markers.

Dataset
JSON

Few-shot Name Entity Recognition on StackOverflow

Dataset
JSON

StackOverflow NER Corpus

StackOverflow NER corpus, which contains more than 1,237 question-answer threads from StackOverflow 10-year archive with 27 types of entities.

Dataset
JSON

Word Class Lattices (WCL)

Word Class Lattices (WCL) was introduced by Navigli and Velardi. It consists of 1,871 deﬁnitional and 2,847 non-deﬁnitional sentences from Wikipedia.

Dataset
JSON

Rare-NER, Bio-NER, and Twitter-POS datasets

The Rare-NER, Bio-NER, and Twitter-POS datasets are used for named entity recognition and part-of-speech tagging.

Dataset
JSON

NLU-ED

The NLU-ED dataset is a benchmark for named entity recognition, consisting of 69 intent labels, 108 slot labels, and a vocabulary of 7.9k tokens.

Dataset
JSON

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages

Dataset
JSON

CoNLL-03, CoNLL-04, ACE-05, and CoNLL-12 datasets

The dataset used in the paper for named entity recognition, end-to-end relation extraction, and coreference resolution.

Dataset
JSON

Fast End-to-End Wikiﬁcation

A run-time oriented tool for context-free Wikiﬁcation based on Wikipedia redirects.

Dataset
JSON

From TagMe to WAT: a new entity annotator

From TagMe to WAT: a new entity annotator.

Dataset
JSON

What did you mention? A large scale mention detection benchmark for spoken an...

A large-scale mention detection benchmark for spoken and written text.

Dataset
JSON

Co-EM Dataset

A named entity recognition dataset

Dataset
JSON

i2b2 2010 dataset

The i2b2 2010 dataset is a corpus of clinical text that was created as part of an NLP challenge in 2010.

Dataset
JSON

i2b2 2012 dataset

The i2b2 2012 dataset is a corpus of clinical text that was created as part of an NLP challenge in 2012.

Dataset
JSON

Named Entity Recognition with Bidirectional LSTM-CNNs

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high...

Dataset
JSON

Resume, CoNLL-2004, FewFC

Resume for NER; CoNLL-2004 for RE; FewFC for EE

Dataset
JSON

CoNLL-2003 English NER shared task dataset

The CoNLL-2003 English NER shared task dataset, consisting of 14,041/3,250/3,453 sentences in the training/development/test set respectively, all extracted from Reuters news...

Dataset
JSON

ACE 2005, WebNLG, CoNLL, NYT, and FB15k-237

The dataset used in the paper is ACE 2005, WebNLG, CoNLL, NYT, and FB15k-237. The ACE 2005 dataset is a collection of news articles, while WebNLG is a corpus used for natural...

Dataset
JSON

NEWS 2010 English-Hindi test set

The NEWS 2010 English-Hindi test set is used for transliteration equivalence evaluation.

Dataset
JSON

NEWS 2009 English-Hindi training set

The NEWS 2009 English-Hindi training set is used for transliteration equivalence learning.

Dataset
JSON

80 datasets found