-
Corpus of Annotated Novels
The dataset comprises 13 full-text novels tagged with protagonistTagger that comprises more than 35,000 mentions of literary characters. -
Protagonists' Tagger in Literary Domain
The dataset comprises 1,300 sentences from 13 classic novels of different genres that a novel reader had manually annotated. -
MedCATTrainer
MedCATTrainer is a web-based interface for inspecting, adding and correcting biomedical NER+L models through active learning. An additional interface allows research specific... -
CONLL 2002
The dataset used for evaluation of the proposed model. -
I2B2 2009 Medical Information Extraction Challenge
Named Entity Recognition in Electronic Health Records using Transfer Learning Bootstrapped Neural Networks -
FIGER dataset
The FIGER dataset contains 2M data samples labeled with 113 types. -
OntoNotes dataset
The OntoNotes dataset contains 3.4M automatically labeled entity mentions for training and 11k manually annotated instances that are split into 8k for dev set and 2k for test set. -
Recipe1M+ Dataset
The Recipe1M+ dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. -
Assorted, Archetypal, and Annotated Two Million (3A2M) Cooking Recipe Dataset
The 3A2M dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. -
Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Reci...
The 3A2M+ dataset is a large collection of culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. -
ClubFloyd dataset
The ClubFloyd dataset is a collection of human transcripts of text-based games, used to train action candidate generators. -
SMM4H18_Test
The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy. -
SMM4H18_Val
The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy. -
SMM4H18_Train
The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy. -
BioCreative_TrainTask3.1
The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy. -
BioCreative_ValTask3
The dataset consists of tweets posted by 212 Twitter users during and after their pregnancy. -
BioCreative_TrainTask3.0
The dataset consists of all tweets posted by 212 Twitter users during and after their pregnancy.