-
Coronary Arteriography Reports
The dataset consists of coronary arteriography reports collected from Shuguang Hospital, including five types of entities and five relations relevant to medical text processing. -
Stanford Natural Language Inference Corpus (SNLI)
The Stanford Natural Language Inference Corpus (SNLI) dataset is used for natural language inference tasks. -
Stanford Sentiment Treebank (SST-5)
The SST-5 dataset is a sentiment analysis dataset consisting of movie reviews with five labels for sentiment classification. -
WNUT16 NER
WNUT16 is a shared task dataset for named entity recognition over Twitter, consisting of annotated tweets used for identifying named entities in informal digital text. -
CoNLL 2003 NER dataset
The CoNLL 2003 shared task dataset is focused on named entity recognition tasks. -
CoNLL 2000 chunking dataset
The CoNLL 2000 shared task dataset is used for chunking tasks in natural language processing. -
Universal Dependencies v. 1.3
This dataset contains part-of-speech tags for English, derived from the first 500 sentences of the Universal Dependencies corpus, reducing the training set to increase difficulty. -
ACE Entities/Events
The ACE 2005 dataset consists of annotated documents for event and entity detection, with a focus on various domains including newswire and blogs. -
IMDB Movie Reviews
The IMDB dataset consists of 54000 movie reviews intended as a background corpus for evaluating spell correction models, containing a larger vocabulary for robust word recognition. -
Stanford Sentiment Treebank (SST)
The Stanford Sentiment Treebank (SST) dataset contains 8544 movie reviews used for evaluating the spell correctors focusing on sentiment classification tasks. -
Large QA-SRL
Large QA-SRL dataset is a large-scale dataset designed for semantic role labeling, capturing a diverse set of question-answer pairs that are representative of predicate-argument...