Text Classification - Groups

AG's News Corpus

AG's News Corpus
- Dataset
- JSON
Rcv1: A new benchmark collection for text categorization research

Rcv1: A new benchmark collection for text categorization research.
- Dataset
- JSON
AGNews, 20News, NYT, IMDB

AGNews, 20News, NYT, IMDB are datasets used for weakly supervised text classification.
- Dataset
- JSON
HateXplain

The HateXplain dataset, containing 20,000 posts from Gab and Twitter, annotated with hate/offensive/normal labels.
- Dataset
- JSON
CLIMABENCH

CLIMABENCH is a benchmark of climate-related text classification tasks. It collates five existing climate change-related text datasets, including CLIMATEXT, CLIMATESTANCE,...
- Dataset
- JSON
AllNews

The dataset used in this paper is a collection of news articles from AllNews.
- Dataset
- JSON
Wiki40B

The dataset used in this paper is a collection of documents from Wikipedia.
- Dataset
- JSON
NeurIPS dataset

The NeurIPS dataset is a collection of 7241 papers published in NeurIPS from 1987 to 2016.
- Dataset
- JSON
WOS

WOS dataset is a text classification dataset containing scientific articles from Web of Science.
- Dataset
- JSON
Wikipedia dataset

The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...
- Dataset
- JSON
IMDB Document

The dataset used in the paper is a collection of text sequences for text classification tasks.
- Dataset
- JSON
Yelp 2014 Document

The dataset used in the paper is a collection of text sequences for text classification tasks.
- Dataset
- JSON
Yelp 2013 Document

The dataset used in the paper is a collection of text sequences for text classification tasks.
- Dataset
- JSON
Yelp Review Dataset

The Yelp review dataset contains hotel and restaurant reviews filtered (spam) and recommended (legitimate) by Yelp.
- Dataset
- JSON
20News

Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging.
- Dataset
- JSON
Word2Vec

Bilingual word embeddings from parallel and non-parallel corpora for cross-language text classification
- Dataset
- JSON
20NG Dataset

The 20NG dataset is a text classification dataset containing 20 categories.
- Dataset
- JSON
Ohsumed Dataset

The Ohsumed dataset is a text classification dataset containing 3,357 documents.
- Dataset
- JSON
Reuters Dataset

The Reuters dataset is a text classification dataset containing 21,578 samples.
- Dataset
- JSON
Text Classification Dataset

The dataset used for text classification, which is a variant of the typical text classification model based on convolutional operation and max-pooling layer.
- Dataset
- JSON

182 datasets found