Dataset - LDM

Divar Dataset

A dataset for measuring the domain similarity of Persian texts, generated from a dataset of advertisements posted on Divar application.
- Dataset
- JSON
Towards Improving Selective Prediction Ability of NLP Systems

SNLI, MNLI, Stress Test, Matched Mismatched, Competence, Distraction, and Noise datasets
- Dataset
- JSON
AG News Dataset

The AG News - News articles from over 2000 news sources annotated by type of news: Sports, World, Business, and Science/Tech. 120k training and 7k test sets are provided.
- Dataset
- JSON
OTTER: Improving Zero-Shot Classification via Optimal Transport

Zero-shot models suffer due to artifacts inherited from pretraining. A particularly detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label...
- Dataset
- JSON
CNN/DailyMail and XSum

The CNN/DailyMail dataset is a collection of news articles, and the XSum dataset is a collection of news articles with summaries.
- Dataset
- JSON
SuperGLUE

The dataset used in the paper is the SuperGLUE benchmark, which includes 17 tasks: STS-B, MRPC, MNLI, QNL, QNLI, CoLA, SST-2, MRPC, GLUE, NLI, NQ, ReCoRD, ReCoRD-Sub,...
- Dataset
- JSON
Diggs dataset

The dataset used for testing the sLDA model [16].
- Dataset
- JSON
ImageNet and SST2 datasets

The dataset used in this study for image and text classification tasks.
- Dataset
- JSON
LLM dataset

The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...
- Dataset
- JSON
MMLU dataset

The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),...
- Dataset
- JSON
Bibtex

The dataset is used for multilabel learning tasks. It contains 7395 documents, each labeled with 159 relevant tickers.
- Dataset
- JSON
SST-2, Irony, IronyB, TREC6, and SNIPS

The dataset used in this paper is SST-2, Irony, IronyB, TREC6, and SNIPS.
- Dataset
- JSON
AGNews

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
- Dataset
- JSON
CIFAR-100 and AGNews

Two datasets used for multi-task learning, CIFAR-100 and AGNews.
- Dataset
- JSON
Sem2015-Laptop

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
Sem2015-Restaurant

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
BeerAdvocate

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
CitySearch

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
A Million News Headlines, Fake and real news, Getting Real about Fake News

The dataset is a combination of 3 singular datasets: A Million News Headlines, Fake and real news, Getting Real about Fake News.
- Dataset
- JSON
Rotten Tomatoes

The Rotten Tomatoes dataset has 5331 positive and 5331 negative review sentences.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

136 datasets found