-
Global pointer: Novel efficient span-based approach for named entity recognition
Global pointer: Novel efficient span-based approach for named entity recognition. -
CMID, KUAKE-QIC, Intent-Merged
Biomedical intent detection and named entity recognition datasets -
JNLPBA, DDI, BC5CDR, NCBI-Disease, AnatEM
Biomedical intent detection and named entity recognition datasets -
The Pile dataset
The Pile dataset is a large-scale dataset containing 800GB of text data. -
LM-Extraction benchmark
The LM-Extraction benchmark is derived from The Pile (Gao et al., 2020) dataset, which contains 15,000 pairs of prefixes and suffixes derived from The Pile dataset (Gao et al.,... -
Twitter Name Tagging (TNT) and Broad Twitter Corpus (BTC)
Twitter Name Tagging (TNT) and Broad Twitter Corpus (BTC) datasets are used for named entity recognition in social media. -
DSTC-FRAMES-ENHI
An extended dataset DSTC-FRAMES-ENHI which contains a total of 37785 samples, 7 entities with 1106 unique entities values (with IOB-prefixes). -
DSTC-FRAMES-EN
A combined dataset formed from two public English task-oriented conversational datasets belonging to travel and restaurant domains respectively. -
Chinese OntoNotes v5.0
This dataset is used for Named Entity Recognition (NER) tasks. -
LaptopReview dataset
The LaptopReview dataset contains 3,012 mentions to laptop features. -
CoNLL 2003 dataset
The CoNLL 2003 dataset is a collection of news-wire articles used for sequence labeling tasks. -
CoNLL04 dataset
The dataset used in the paper is the CoNLL04 dataset, which is a benchmark for relation and entity recognition.