-
Jigsaw Dataset
The Jigsaw dataset is a collection of text, where each text is labeled as toxic or non-toxic. -
Amazon
The dataset used in the paper is a series of datasets introduced in [46], comprising large corpora of product reviews crawled from Amazon.com. Top-level product categories on... -
GoogleNews
The dataset used in this paper is a collection of news articles from Google News. -
20NewsGroups
The dataset used in this paper is a collection of documents from various domains, including news, articles, and emails. -
Penn Treebank
The Penn Treebank dataset contains one million words of 1989 Wall Street Journal material annotated in Treebank II style, with 42k sentences of varying lengths. -
Amazon review dataset
The Amazon review dataset is used for multi-source domain adaptation. It contains review texts and ratings of bought products. Products are grouped into categories. Following... -
BioText dataset
The BioText dataset contains more than 3,500 text samples classified into one of eight classes, which specify the type of semantic relationship between disease and treatment... -
Rotten Tomatoes Movie Reviews (RT) and IMDB
The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a sentiment analysis task on two public benchmark datasets: Rotten Tomatoes... -
Book Categories
Two text classification data sets for evaluating the quality of interpretability methods. -
Ott dataset
The dataset used in this paper for deceptive opinions detection -
BookCorpus
The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text. -
Reuters RCV1-v2
The Reuters RCV1-v2 contains 804,414 newswire articles. There are 103 topics which form a tree hierarchy. Thus documents typically have multiple labels. The data was randomly... -
Penn Treebank dataset
The dataset used in the paper is the Penn Treebank dataset, which is a large-scale text classification dataset.