Dataset - LDM

Diggs dataset

The dataset used for testing the sLDA model [16].
- Dataset
- JSON
ImageNet and SST2 datasets

The dataset used in this study for image and text classification tasks.
- Dataset
- JSON
LLM dataset

The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...
- Dataset
- JSON
MMLU dataset

The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),...
- Dataset
- JSON
Bibtex

The dataset is used for multilabel learning tasks. It contains 7395 documents, each labeled with 159 relevant tickers.
- Dataset
- JSON
SST-2, Irony, IronyB, TREC6, and SNIPS

The dataset used in this paper is SST-2, Irony, IronyB, TREC6, and SNIPS.
- Dataset
- JSON
AGNews

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
- Dataset
- JSON
CIFAR-100 and AGNews

Two datasets used for multi-task learning, CIFAR-100 and AGNews.
- Dataset
- JSON
Sem2015-Laptop

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
Sem2015-Restaurant

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
BeerAdvocate

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
CitySearch

The dataset used for Aspect-Based Sentiment Analysis (ABSA) experiments.
- Dataset
- JSON
A Million News Headlines, Fake and real news, Getting Real about Fake News

The dataset is a combination of 3 singular datasets: A Million News Headlines, Fake and real news, Getting Real about Fake News.
- Dataset
- JSON
Rotten Tomatoes

The Rotten Tomatoes dataset has 5331 positive and 5331 negative review sentences.
- Dataset
- JSON
Reuters Corpus Volume 2

A multilingual corpus with a collection of 487,000 news stories.
- Dataset
- JSON
Harry Potter unlearning dataset

The dataset used in the paper is a concatenation of the original Harry Potter books and synthetic discussions, blog posts, and wiki-like entries about the books.
- Dataset
- JSON
Alexa Massive

The dataset used in the paper for instruction tuning of Large Language Models (LLMs) during instruction tuning.
- Dataset
- JSON
RT

The dataset used in the paper for instruction tuning of Large Language Models (LLMs) during instruction tuning.
- Dataset
- JSON
SST

The dataset used in the paper is the Stanford Sentiment Treebank (SST) dataset, which contains standard train/dev/test sets and two subtasks: binary sentence classification or...
- Dataset
- JSON
Sample Selection for Data Augmentation in Natural Language Processing

Deep learning-based text classification models need abundant labeled data to obtain competitive performance. To tackle this, multiple researches try to use data augmentation to...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

110 datasets found