Text Classification - Groups

Divar Dataset

A dataset for measuring the domain similarity of Persian texts, generated from a dataset of advertisements posted on Divar application.
- Dataset
- JSON
Didi Ride-Sharing Comment Dataset

The benchmark ride-sharing comment user experience data set was constructed from the real comments in the main city zone of ride-sharing orders within the time period from Mar...
- Dataset
- JSON
AmazonTitles-670K

The dataset used in the LightDXML paper for extreme multi-label classification.
- Dataset
- JSON
WikiSeeAlsoTitles-350K

The dataset used in the LightDXML paper for extreme multi-label classification.
- Dataset
- JSON
Wiki10-31K

The dataset used in the LightDXML paper for extreme multi-label classification.
- Dataset
- JSON
EURLex-4K

The dataset used in the LightDXML paper for extreme multi-label classification.
- Dataset
- JSON
Towards Improving Selective Prediction Ability of NLP Systems

SNLI, MNLI, Stress Test, Matched Mismatched, Competence, Distraction, and Noise datasets
- Dataset
- JSON
AG News Dataset

The AG News - News articles from over 2000 news sources annotated by type of news: Sports, World, Business, and Science/Tech. 120k training and 7k test sets are provided.
- Dataset
- JSON
CNN/DailyMail and XSum

The CNN/DailyMail dataset is a collection of news articles, and the XSum dataset is a collection of news articles with summaries.
- Dataset
- JSON
Clickbait Challenge 2017

The Clickbait Challenge 2017 dataset, a collection of social media posts and their corresponding article titles, used for clickbait detection.
- Dataset
- JSON
Diggs dataset

The dataset used for testing the sLDA model [16].
- Dataset
- JSON
Fake News Challenge Stage 1 (FNC-1)

The FNC-1 dataset is a supervised classification task for stance detection, where the goal is to automatically predict the labels in a supervised classification task.
- Dataset
- JSON
ImageNet and SST2 datasets

The dataset used in this study for image and text classification tasks.
- Dataset
- JSON
LLM dataset

The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...
- Dataset
- JSON
MMLU dataset

The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),...
- Dataset
- JSON
Bibtex

The dataset is used for multilabel learning tasks. It contains 7395 documents, each labeled with 159 relevant tickers.
- Dataset
- JSON
CAL500

Text categorization, a document may be associated with a range of topics, such as science, entertainment, and news.
- Dataset
- JSON
SST-2, Irony, IronyB, TREC6, and SNIPS

The dataset used in this paper is SST-2, Irony, IronyB, TREC6, and SNIPS.
- Dataset
- JSON
AGNews

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
- Dataset
- JSON
CIFAR-100 and AGNews

Two datasets used for multi-task learning, CIFAR-100 and AGNews.
- Dataset
- JSON

211 datasets found