-
Divar Dataset
A dataset for measuring the domain similarity of Persian texts, generated from a dataset of advertisements posted on Divar application. -
Didi Ride-Sharing Comment Dataset
The benchmark ride-sharing comment user experience data set was constructed from the real comments in the main city zone of ride-sharing orders within the time period from Mar... -
AmazonTitles-670K
The dataset used in the LightDXML paper for extreme multi-label classification. -
WikiSeeAlsoTitles-350K
The dataset used in the LightDXML paper for extreme multi-label classification. -
Wiki10-31K
The dataset used in the LightDXML paper for extreme multi-label classification. -
Towards Improving Selective Prediction Ability of NLP Systems
SNLI, MNLI, Stress Test, Matched Mismatched, Competence, Distraction, and Noise datasets -
AG News Dataset
The AG News - News articles from over 2000 news sources annotated by type of news: Sports, World, Business, and Science/Tech. 120k training and 7k test sets are provided. -
CNN/DailyMail and XSum
The CNN/DailyMail dataset is a collection of news articles, and the XSum dataset is a collection of news articles with summaries. -
Clickbait Challenge 2017
The Clickbait Challenge 2017 dataset, a collection of social media posts and their corresponding article titles, used for clickbait detection. -
Diggs dataset
The dataset used for testing the sLDA model [16]. -
Fake News Challenge Stage 1 (FNC-1)
The FNC-1 dataset is a supervised classification task for stance detection, where the goal is to automatically predict the labels in a supervised classification task. -
ImageNet and SST2 datasets
The dataset used in this study for image and text classification tasks. -
LLM dataset
The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their... -
MMLU dataset
The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),... -
SST-2, Irony, IronyB, TREC6, and SNIPS
The dataset used in this paper is SST-2, Irony, IronyB, TREC6, and SNIPS. -
CIFAR-100 and AGNews
Two datasets used for multi-task learning, CIFAR-100 and AGNews.