Dataset - LDM

AG News, SogouNews and DBpedia

The AG News, SogouNews and DBpedia datasets are used for text classification experiments.
- Dataset
- JSON
Experimental Results

The authors evaluate the performance of their proposed conformal prediction methods for multistep feedback covariate shift (MFCS) on synthetic black-box optimization and active...
- Dataset
- JSON
Amazon Reviews

The Amazon Reviews dataset is used to predict the usefulness of Amazon reviews using off-the-shelf argumentation mining.
- Dataset
- JSON
WebKB

The dataset used in this paper is a probabilistic logic programming dataset, which is a probabilistic version of the WebKB dataset.
- Dataset
- JSON
Reuters-8

The Reuters-8 dataset is a collection of news articles from Reuters.
- Dataset
- JSON
20Newsgrp

The 20Newsgrp dataset is a collection of news articles from 20 different newsgroups.
- Dataset
- JSON
MSMARCO

The dataset used for training and evaluating IR systems, containing a large collection of documents and queries.
- Dataset
- JSON
Twitter

Dialogue systems – often referred to as conversational agents, chatbots, etc. – provide convenient human-machine interfaces and have become increasingly prevalent with the...
- Dataset
- JSON
News

The News dataset consists of 5000 randomly sampled news articles from the NY Times corpus. It simulates the opinions of media consumers on news items. The units are different...
- Dataset
- JSON
Emotion Classification

The Emotion Classification dataset consists of emotion-related text.
- Dataset
- JSON
X-FORMAL

X-FORMAL dataset contains pairs of formal and informal texts in four languages: Brazilian Portuguese, French, Italian, and English.
- Dataset
- JSON
GYAFC

The GYAFC dataset is a formality transfer dataset for English that contains aligned formal and informal sentences from two domains: Entertainment & Music and Family &...
- Dataset
- JSON
MARC

The MARC dataset is a multilingual text classification dataset that contains 6 languages.
- Dataset
- JSON
M10

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
20 NewsGroups

The paper discusses the use of multi-objective Bayesian optimization for hyperparameter transfer in topic models.
- Dataset
- JSON
MR, Subj, SST-1, SST-2, MPQA

The dataset used in this paper for text classification task.
- Dataset
- JSON
20NEWS Dataset

The dataset used in the paper is the 20NEWS dataset, consisting of 18,845 text documents with 20 topic labels.
- Dataset
- JSON
TEL-NLP

The TEL-NLP dataset is a collection of Telugu text data for four NLP tasks: sentiment analysis, emotion identification, hate speech detection, and sarcasm detection.
- Dataset
- JSON
Yelp Dataset

The Yelp Dataset contains 1.6M reviews and 500K tips by 366K users for 61K businesses; 481K business attributes, such as hours, parking availability, ambience; and check-ins for...
- Dataset
- JSON
IMDB Sentiment Classification

The IMDB sentiment classification dataset is used for text classification tasks.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

110 datasets found