No Organization - Organizations

BEA 2019 shared task dataset

The Building Educational Applications (BEA) shared task on GEC provides datasets including the Cambridge English Write & Improve corpus, which is composed of texts written...

Dataset
JSON

CoNLL 2014 shared task dataset

The CoNLL 2014 shared task dataset is comprised of essays written by undergraduate students, annotated for grammatical errors.

Dataset
JSON

First Certificate in English (FCE) dataset

The First Certificate in English (FCE) dataset contains essays written by non-native learners of English assessed in a language exam, annotated for language errors and...

Dataset
JSON

WMT19 QE Datasets

The dataset consists of parallel data from various corpuses used for training and evaluating the bilingual BERT model for translation quality estimation.

Dataset
JSON

QuAC

QuAC is a dataset for question answering in a conversational context, requiring understanding of the multi-turn dialogue history to provide contextually relevant answers derived...

Dataset
JSON

Sarcastic Tweets Dataset

A dataset of 3,000 sarcastic tweets, each interpreted by five human judges, focusing on the task of sarcasm interpretation.

Dataset
JSON

Sarcasm Interpretation Dataset

The dataset contains 4,762 pairs of sarcastic messages and hearer interpretations, collected through a crowdsourcing experiment.

Dataset
JSON

MedNLI

The MedNLI dataset is used to predict the entailment relation between a pair of sentences, with premises taken from doctors' notes in the clinical dataset MIMIC-III.

Dataset
JSON

MultiNLI

The MultiNLI corpus is a dataset designed to assist in learning natural language inference, featuring sentence pairs labeled as entailment, neutral, or contradiction, which aid...

Dataset
JSON

Sexism Categorization Dataset

The dataset comprises 13023 accounts of sexism, including first-person accounts from survivors, each tagged with at least one of 23 categories of sexism.

Dataset
JSON

ConvAI2 Dataset

The ConvAI2 dataset, derived from Persona-Chat, contains dialogues between crowdworkers who role-play as assigned personas, enabling the development of conversational agents...

Dataset
JSON

REST dataset

The REST dataset is derived from restaurant reviews, also containing review sentences and aspect sentiment annotations for aspect-based sentiment analysis.

Dataset
JSON

LAPTOP dataset

The LAPTOP dataset is used for aspect-based sentiment analysis, containing review sentences along with gold standard aspect sentiment annotations.

Dataset
JSON

OCNLI

OCNLI is a dataset for natural language inference adapted for Chinese language, consisting of premise-hypothesis pairs.

Dataset
JSON

BQ Corpus

BQ Corpus is a large-scale dataset for sentence semantic equivalence identification in Chinese.

Dataset
JSON

LCQMC

LCQMC is a large-scale Chinese question matching corpus used for determining the semantic equivalence of question pairs.

Dataset
JSON

TNEWS

TNEWS is a short text classification dataset consisting of news titles and keywords requiring classification into one of 15 classes.

Dataset
JSON

THUCNews

THUCNews is a dataset used for news categorization tasks in different genres, containing 50K news articles in ten domains.

Dataset
JSON

ChnSentiCorp

ChnSentiCorp is a dataset used for sentiment classification in Chinese documents, where the text is classified into positive or negative labels.

Dataset
JSON

CJRC

CJRC is a dataset for machine reading comprehension specializing in Chinese legal judgments, containing yes/no questions, no-answer questions, and span-extraction questions.

Dataset
JSON

20,499 datasets found