Text Classification - Groups

Yahoo

The Yahoo dataset used for training and testing the proposed model, containing leaked passwords.
- Dataset
- JSON
DBP

The dataset used for sentiment analysis and topic classification tasks.
- Dataset
- JSON
AG

The dataset used for sentiment analysis and topic classification tasks.
- Dataset
- JSON
BERT

The dataset used in this paper is a pre-trained BERT model trained on English Wikipedia and Books datasets.
- Dataset
- JSON
Reuters-21578

Text classiﬁcation problem has long been an interesting research ﬁeld, the aim of text classiﬁcation is to develop algorithm to ﬁnd the categories of given documents.
- Dataset
- JSON
Amazon Review

The Amazon Review dataset is a widely used benchmark dataset for cross-domain sentiment analysis.
- Dataset
- JSON
Text Classification based on Multiple Block Convolutional Highways

Text classification based on Multiple Block Convolutional Highways
- Dataset
- JSON
Yelp Dataset Challenge

The Yelp dataset challenge contains reviews and images of restaurants, with the goal of recommending images for each review.
- Dataset
- JSON
C4

The dataset used for pre-training language models, containing a large collection of text documents.
- Dataset
- JSON
Amazon@Beauty and Amazon@Books datasets

The Amazon@Beauty dataset is a collection of product reviews from Amazon.com, and the Amazon@Books dataset is a collection of product reviews from Amazon.com.
- Dataset
- JSON
OpenWebText Corpus

A dataset for language modeling, where the goal is to predict the next word in a sequence given the previous words.
- Dataset
- JSON
The pushshift reddit dataset

The pushshift reddit dataset
- Dataset
- JSON
Conditional Generative Matching Model for Multi-lingual Reply Suggestion

A Conditional Generative Matching Model for Multi-lingual Reply Suggestion
- Dataset
- JSON
IMDB dataset

The IMDB dataset is a polarity dataset for sentiment analysis or text classification, it contains 50000 sentences and their binary class labels, being either "Positive" or...
- Dataset
- JSON
Disin dataset

The Disin dataset is a fake news dataset on Kaggle, including 12,600 fake news articles and 12,600 truthful news articles.
- Dataset
- JSON
SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...
- Dataset
- JSON
COVID-19 Research Articles Classification

The dataset used for text classification to support Epistemonikos' effort to filter and categorize research articles related to COVID-19.
- Dataset
- JSON
AG News

The dataset used in the paper is a language domain dataset, specifically for sentiment classification, named AG News. The dataset is used to evaluate the performance of...
- Dataset
- JSON
AGNews Dataset

The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).
- Dataset
- JSON
Amazon

The dataset used in the paper is a series of datasets introduced in [46], comprising large corpora of product reviews crawled from Amazon.com. Top-level product categories on...
- Dataset
- JSON

182 datasets found