Dataset - LDM

ResumeNER

This dataset is used for Named Entity Recognition (NER) tasks.
- Dataset
- JSON
Chinese OntoNotes v5.0

This dataset is used for Named Entity Recognition (NER) tasks.
- Dataset
- JSON
LaptopReview dataset

The LaptopReview dataset contains 3,012 mentions to laptop features.
- Dataset
- JSON
CoNLL 2003 dataset

The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks.
- Dataset
- JSON
LaptopReview

The LaptopReview dataset refers to Sub-task 1 for laptop aspect term (e.g., disk drive) recognition. It consists of 3,845 review sentences, which contains 3,012 AspectTerm...
- Dataset
- JSON
NCBI-Disease

The NCBI-Disease dataset consists of 793 PubMed abstracts, which has been separated into training set (593), development set (100), and test set (100). The dataset contains...
- Dataset
- JSON
BC5CDR

The BC5CDR dataset consists of 1,500 PubMed articles, which has been separated into training set (500), development set (500), and test set (500). The dataset contains 15,935...
- Dataset
- JSON
Probase

Probase is a probabilistic knowledge base and it contains millions of entities and concepts. One of the advantages of Probase is that in comparison with the well-known knowledge...
- Dataset
- JSON
Seungjeongwon Corpus

The Seungjeongwon corpus is a historical corpus that contains the diary of a royal secretary from the Joseon Dynasty, with annotated named entities and punctuation markers.
- Dataset
- JSON
Few-shot Name Entity Recognition on StackOverflow

Few-shot Name Entity Recognition on StackOverflow
- Dataset
- JSON
StackOverflow NER Corpus

StackOverflow NER corpus, which contains more than 1,237 question-answer threads from StackOverflow 10-year archive with 27 types of entities.
- Dataset
- JSON
Word Class Lattices (WCL)

Word Class Lattices (WCL) was introduced by Navigli and Velardi. It consists of 1,871 deﬁnitional and 2,847 non-deﬁnitional sentences from Wikipedia.
- Dataset
- JSON
CL-NERIL: A Cross-Lingual Model for NER in Indian Languages

CL-NERIL: A Cross-Lingual Model for NER in Indian Languages
- Dataset
- JSON
CoNLL-03, CoNLL-04, ACE-05, and CoNLL-12 datasets

The dataset used in the paper for named entity recognition, end-to-end relation extraction, and coreference resolution.
- Dataset
- JSON
Penn Tree Bank

The Penn Tree Bank dataset is a corpus split into a training, validation and testing set of 929k words, a validation set of 73k words, and a test set of 82k words. The...
- Dataset
- JSON
Named Entity Recognition with Bidirectional LSTM-CNNs

Named entity recognition is a challenging task that has traditionally required large amounts of knowledge in the form of feature engineering and lexicons to achieve high...
- Dataset
- JSON
CoNLL-2003 English NER shared task dataset

The CoNLL-2003 English NER shared task dataset, consisting of 14,041/3,250/3,453 sentences in the training/development/test set respectively, all extracted from Reuters news...
- Dataset
- JSON
NEWS 2010 English-Hindi test set

The NEWS 2010 English-Hindi test set is used for transliteration equivalence evaluation.
- Dataset
- JSON
NEWS 2009 English-Hindi training set

The NEWS 2009 English-Hindi training set is used for transliteration equivalence learning.
- Dataset
- JSON
CoNLL-2012

The CoNLL-2012 shared task dataset is a dataset for coreference resolution tasks.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

59 datasets found