No Organization - Organizations

Coronary Arteriography Reports

The dataset consists of coronary arteriography reports collected from Shuguang Hospital, including five types of entities and five relations relevant to medical text processing.

Dataset
JSON

Stanford Natural Language Inference Corpus (SNLI)

The Stanford Natural Language Inference Corpus (SNLI) dataset is used for natural language inference tasks.

Dataset
JSON

Stanford Sentiment Treebank (SST-5)

The SST-5 dataset is a sentiment analysis dataset consisting of movie reviews with five labels for sentiment classification.

Dataset
JSON

WNUT16 NER

WNUT16 is a shared task dataset for named entity recognition over Twitter, consisting of annotated tweets used for identifying named entities in informal digital text.

Dataset
JSON

GENIA NER

The GENIA NER dataset consists of annotated Medline abstracts that contain information on biological entities such as proteins and genes, used for named entity recognition in...

Dataset
JSON

CoNLL 2003 NER dataset

The CoNLL 2003 shared task dataset is focused on named entity recognition tasks.

Dataset
JSON

CoNLL 2000 chunking dataset

The CoNLL 2000 shared task dataset is used for chunking tasks in natural language processing.

Dataset
JSON

Universal Dependencies v. 1.3

This dataset contains part-of-speech tags for English, derived from the first 500 sentences of the Universal Dependencies corpus, reducing the training set to increase difficulty.

Dataset
JSON

ACE Entities/Events

The ACE 2005 dataset consists of annotated documents for event and entity detection, with a focus on various domains including newswire and blogs.

Dataset
JSON

MSRA

MSRA dataset comes from the news domain and is widely used for Chinese Named Entity Recognition.

Dataset
JSON

Weibo

Weibo NER was built based on text in Chinese social media, containing various types of named entities.

Dataset
JSON

IMDB Movie Reviews

The IMDB dataset consists of 54000 movie reviews intended as a background corpus for evaluating spell correction models, containing a larger vocabulary for robust word recognition.

Dataset
JSON

Stanford Sentiment Treebank (SST)

The Stanford Sentiment Treebank (SST) dataset contains 8544 movie reviews used for evaluating the spell correctors focusing on sentiment classification tasks.

Dataset
JSON

NewsQA

NewsQA is a machine comprehension dataset featuring questions and answers derived from news articles, aimed at developing models that can understand and reason about the content.

Dataset
JSON

QA-RE

QA-RE is a dataset that formats relation extraction as question-answer pairs, allowing research on how to leverage QA data for extracting relational information.

Dataset
JSON

Large QA-SRL

Large QA-SRL dataset is a large-scale dataset designed for semantic role labeling, capturing a diverse set of question-answer pairs that are representative of predicate-argument...

Dataset
JSON

QAMR

The QAMR dataset captures a broad range of questions and answers that relate to predicate-argument structures, focusing on implicit arguments and inferred relations not captured...

Dataset
JSON

WNED-CWEB

The WNED-CWEB dataset is a benchmark for named entity disambiguation.

Dataset
JSON

WNED-WIKI

The WNED-WIKI dataset is designed for evaluating entity disambiguation systems.

Dataset
JSON

ACE2004

The ACE2004 dataset is used for various information extraction tasks including entity recognition and disambiguation.

Dataset
JSON

20,499 datasets found