Dataset - LDM

LDC2020T07

Cross-lingual Abstract Meaning Representation (AMR) parsing dataset
- Dataset
- JSON
ANERcorp

ANERcorp is a named entity recognition dataset.
- Dataset
- JSON
MasakhaNER 2.0

MasakhaNER 2.0 is a NER dataset in the news domain, including the annotations on 20 African languages.
- Dataset
- JSON
PaDaS-Lab/legal-reference-annotations

The dataset of privacy policies annotated using GDPR-compliant named entities.
- Dataset
- JSON
Legal-Entity-Recognition

The dataset of German legal reference annotations.
- Dataset
- JSON
Chinese NER using lattice LSTM

Chinese NER using lattice LSTM
- Dataset
- JSON
Financial news corpus for company name recognition

Financial news corpus, company names dictionary, 35wSents dataset, Albert65kError dataset, development and test datasets
- Dataset
- JSON
CoNLL03

The CoNLL03 dataset is a low-resource named entity recognition dataset. The dataset contains 4 entity types: person, location, organization, and miscellaneous entities. The...
- Dataset
- JSON
PANX and UDPOS datasets

The PANX and UDPOS datasets are used for Named Entity Recognition and Part-of-Speech Tagging tasks among the CJKV languages.
- Dataset
- JSON
Facebook Product Name Identification Dataset

The dataset of posts from Facebook used for product name identification.
- Dataset
- JSON
SciFoodNER

A dataset of 88,526 ingredient phrases, created using Stratified Entity Frequency Sampling.
- Dataset
- JSON
Polyglot-ner

Polyglot-ner is a multilingual NER dataset.
- Dataset
- JSON
WikiGoldSK

WikiGoldSK is a manually annotated Slovak NER dataset.
- Dataset
- JSON
OLID

The dataset used in the paper is a collection of text samples from the Offensive Language Identification Dataset (OLID).
- Dataset
- JSON
2010 i2b2/VA challenge dataset

The 2010 i2b2/VA challenge dataset consists of clinical summaries from three different medical sites: Partners Healthcare, Beth Israel Deaconess Medical Center, and the...
- Dataset
- JSON
TAC2017 Adverse Drug Reaction Extraction Task Testing Dataset

The testing dataset used for the adverse drug reaction extraction task in TAC2017.
- Dataset
- JSON
TAC2017 Adverse Drug Reaction Extraction Task Training Dataset

The training dataset used for the adverse drug reaction extraction task in TAC2017.
- Dataset
- JSON
TAC2017 Adverse Drug Reaction Extraction Task

The dataset used for the adverse drug reaction extraction task in TAC2017.
- Dataset
- JSON
AGNews

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
- Dataset
- JSON
FinEntity

FinEntity is a high-quality entity-level sentiment classification dataset for financial texts.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

77 datasets found