-
BEA 2019 shared task dataset
The Building Educational Applications (BEA) shared task on GEC provides datasets including the Cambridge English Write & Improve corpus, which is composed of texts written... -
CoNLL 2014 shared task dataset
The CoNLL 2014 shared task dataset is comprised of essays written by undergraduate students, annotated for grammatical errors. -
First Certificate in English (FCE) dataset
The First Certificate in English (FCE) dataset contains essays written by non-native learners of English assessed in a language exam, annotated for language errors and... -
WMT19 QE Datasets
The dataset consists of parallel data from various corpuses used for training and evaluating the bilingual BERT model for translation quality estimation. -
Sarcastic Tweets Dataset
A dataset of 3,000 sarcastic tweets, each interpreted by five human judges, focusing on the task of sarcasm interpretation. -
Sarcasm Interpretation Dataset
The dataset contains 4,762 pairs of sarcastic messages and hearer interpretations, collected through a crowdsourcing experiment. -
Sexism Categorization Dataset
The dataset comprises 13023 accounts of sexism, including first-person accounts from survivors, each tagged with at least one of 23 categories of sexism. -
ConvAI2 Dataset
The ConvAI2 dataset, derived from Persona-Chat, contains dialogues between crowdworkers who role-play as assigned personas, enabling the development of conversational agents... -
REST dataset
The REST dataset is derived from restaurant reviews, also containing review sentences and aspect sentiment annotations for aspect-based sentiment analysis. -
LAPTOP dataset
The LAPTOP dataset is used for aspect-based sentiment analysis, containing review sentences along with gold standard aspect sentiment annotations. -
ChnSentiCorp
ChnSentiCorp is a dataset used for sentiment classification in Chinese documents, where the text is classified into positive or negative labels.