Dataset - LDM

P-Stance, SemEval-2016, and MTSD datasets

Stance detection dataset for political stance detection, including P-Stance, SemEval-2016, and MTSD datasets.
- Dataset
- JSON
UNIREX

The UNIREX framework extends the approach to a more general setting.
- Dataset
- JSON
Movie Reviews

The Movie Reviews dataset contains positive and negative movie reviews with rationales annotated by humans to support classification.
- Dataset
- JSON
DBpedia Animals

The DBpedia Animals dataset comprises 10,000 English Wikipedia article abstracts for animal species.
- Dataset
- JSON
DynaSent

The DynaSent dataset contains approximately 122,000 sentences, each labeled as positive, neutral, or negative.
- Dataset
- JSON
Rcv1: A new benchmark collection for text categorization research

Rcv1: A new benchmark collection for text categorization research.
- Dataset
- JSON
AGNews, 20News, NYT, IMDB

AGNews, 20News, NYT, IMDB are datasets used for weakly supervised text classification.
- Dataset
- JSON
HateXplain

The HateXplain dataset, containing 20,000 posts from Gab and Twitter, annotated with hate/offensive/normal labels.
- Dataset
- JSON
CLIMABENCH

CLIMABENCH is a benchmark of climate-related text classification tasks. It collates five existing climate change-related text datasets, including CLIMATEXT, CLIMATESTANCE,...
- Dataset
- JSON
AllNews

The dataset used in this paper is a collection of news articles from AllNews.
- Dataset
- JSON
Wiki40B

The dataset used in this paper is a collection of documents from Wikipedia.
- Dataset
- JSON
NeurIPS dataset

The NeurIPS dataset is a collection of 7241 papers published in NeurIPS from 1987 to 2016.
- Dataset
- JSON
Wikipedia dataset

The dataset used in the paper is the Wikipedia dataset, which contains over six million English Wikipedia articles with a full-text field associated with 50 training queries...
- Dataset
- JSON
20News

Topic modeling has been a widely used tool for unsupervised text analysis. However, comprehensive evaluations of a topic model remain challenging.
- Dataset
- JSON
Reuters Dataset

The Reuters dataset is a text classification dataset containing 21,578 samples.
- Dataset
- JSON
Text Classification Datasets

The dataset used in the paper is a collection of adversarial examples and natural examples for text classification tasks.
- Dataset
- JSON
Shakespeare dataset

Mobile crowdsensing has gained significant attention in recent years and has become a critical paradigm for emerging Internet of Things applications. The sensing devices...
- Dataset
- JSON
TFDS

Text dataset for text classification and sentiment analysis tasks.
- Dataset
- JSON
NYT

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
- Dataset
- JSON
Reuters21578

The problem of similarity search is to find the most similar items in a large collection to a query item of interest. Fast similarity search is at the core of many information...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

103 datasets found