Named Entity Recognition - Groups - LDM

Corpus of Annotated Novels

The dataset comprises 13 full-text novels tagged with protagonistTagger that comprises more than 35,000 mentions of literary characters.
- Dataset
- JSON
Protagonists' Tagger in Literary Domain

The dataset comprises 1,300 sentences from 13 classic novels of different genres that a novel reader had manually annotated.
- Dataset
- JSON
FIGER dataset

The FIGER dataset contains 2M data samples labeled with 113 types.
- Dataset
- JSON
OntoNotes dataset

The OntoNotes dataset contains 3.4M automatically labeled entity mentions for training and 11k manually annotated instances that are split into 8k for dev set and 2k for test set.
- Dataset
- JSON
Recipe1M+ Dataset

The Recipe1M+ dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions.
- Dataset
- JSON
Assorted, Archetypal, and Annotated Two Million (3A2M) Cooking Recipe Dataset

The 3A2M dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions.
- Dataset
- JSON
Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Reci...

The 3A2M+ dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions.
- Dataset
- JSON
SMM4H18_Test

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
SMM4H18_Val

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
SMM4H18_Train

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
BioCreative_TrainTask3.1

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
BioCreative_ValTask3

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
BioCreative_TrainTask3.0

The dataset consists of all tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
CMID, KUAKE-QIC, Intent-Merged

Biomedical intent detection and named entity recognition datasets
- Dataset
- JSON
JNLPBA, DDI, BC5CDR, NCBI-Disease, AnatEM

Biomedical intent detection and named entity recognition datasets
- Dataset
- JSON
The Pile dataset

The Pile dataset is a large-scale dataset containing 800GB of text data.
- Dataset
- JSON
LM-Extraction benchmark

The LM-Extraction benchmark is derived from The Pile (Gao et al., 2020) dataset, which contains 15,000 pairs of prefixes and suffixes derived from The Pile dataset (Gao et al.,...
- Dataset
- JSON
DSTC-FRAMES-ENHI

An extended dataset DSTC-FRAMES-ENHI which contains a total of 37785 samples, 7 entities with 1106 unique entities values (with IOB-prefixes).
- Dataset
- JSON
DSTC-FRAMES-EN

A combined dataset formed from two public English task-oriented conversational datasets belonging to travel and restaurant domains respectively.
- Dataset
- JSON
CoNLL 2003 dataset

The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks.
- Dataset
- JSON

1
2
»

Before browse our site, please accept our cookies policy