Named Entity Recognition - Groups

Corpus of Annotated Novels

The dataset comprises 13 full-text novels tagged with protagonistTagger that comprises more than 35,000 mentions of literary characters.
- Dataset
- JSON
Protagonists' Tagger in Literary Domain

The dataset comprises 1,300 sentences from 13 classic novels of different genres that a novel reader had manually annotated.
- Dataset
- JSON
MedCATTrainer

MedCATTrainer is a web-based interface for inspecting, adding and correcting biomedical NER+L models through active learning. An additional interface allows research specific...
- Dataset
- JSON
CONLL 2002

The dataset used for evaluation of the proposed model.
- Dataset
- JSON
GENIA

The GENIA dataset is a biological dataset including five entity types: DNA, RNA, protein, cell lineage, and cell type.
- Dataset
- JSON
ACE 2004

The dataset used for evaluation of the proposed model.
- Dataset
- JSON
I2B2 2009 Medical Information Extraction Challenge

Named Entity Recognition in Electronic Health Records using Transfer Learning Bootstrapped Neural Networks
- Dataset
- JSON
WikiNER

The dataset includes a larger set of English Wikipedia documents, which are tagged with named entities.
- Dataset
- JSON
FIGER dataset

The FIGER dataset contains 2M data samples labeled with 113 types.
- Dataset
- JSON
OntoNotes dataset

The OntoNotes dataset contains 3.4M automatically labeled entity mentions for training and 11k manually annotated instances that are split into 8k for dev set and 2k for test set.
- Dataset
- JSON
Recipe1M+ Dataset

The Recipe1M+ dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions.
- Dataset
- JSON
Assorted, Archetypal, and Annotated Two Million (3A2M) Cooking Recipe Dataset

The 3A2M dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions.
- Dataset
- JSON
Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Reci...

The 3A2M+ dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions.
- Dataset
- JSON
ClubFloyd dataset

The ClubFloyd dataset is a collection of human transcripts of text-based games, used to train action candidate generators.
- Dataset
- JSON
SMM4H18_Test

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
SMM4H18_Val

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
SMM4H18_Train

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
BioCreative_TrainTask3.1

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
BioCreative_ValTask3

The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON
BioCreative_TrainTask3.0

The dataset consists of all tweets posted by 212 Twitter users during and after their pregnancy.
- Dataset
- JSON

80 datasets found