Dataset - LDM

Amazon Reviews

The Amazon Reviews dataset is used to predict the usefulness of Amazon reviews using off-the-shelf argumentation mining.
- Dataset
- JSON
news20

The news20 dataset is a multiclass text classification dataset.
- Dataset
- JSON
sector

The sector dataset is a multiclass text classification dataset.
- Dataset
- JSON
rcv1

The rcv1 dataset is a multiclass text classification dataset.
- Dataset
- JSON
Yelp reviews polarity dataset

Yelp reviews polarity dataset
- Dataset
- JSON
Cnews dataset

The Cnews dataset is a collection of news articles from Sina News, filtered from 2005 to 2011. The dataset contains 10 categories of news, including sports, entertainment, home...
- Dataset
- JSON
IMDB Sentiment

The dataset used for training and evaluation of the proposed RRHF paradigm.
- Dataset
- JSON
CNN/DailyMail

A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said.
- Dataset
- JSON
Ren-CECps

Multi-label text classification dataset Ren-CECps
- Dataset
- JSON
RCV1-v2

Multi-label text classification dataset RCV1-v2, Reuters Corpus Volume I
- Dataset
- JSON
Reuters-21578

Text classiﬁcation problem has long been an interesting research ﬁeld, the aim of text classiﬁcation is to develop algorithm to ﬁnd the categories of given documents.
- Dataset
- JSON
Yelp Dataset Challenge

The Yelp dataset challenge contains reviews and images of restaurants, with the goal of recommending images for each review.
- Dataset
- JSON
Natural Instructions

The Natural Instructions (NI) dataset used for evaluating the performance of the DEPTH model on natural language understanding tasks.
- Dataset
- JSON
DiscoEval

The DiscoEval dataset used for evaluating the performance of the DEPTH model on discourse-related tasks.
- Dataset
- JSON
C4

The dataset used for pre-training language models, containing a large collection of text documents.
- Dataset
- JSON
Amazon@Beauty and Amazon@Books datasets

The Amazon@Beauty dataset is a collection of product reviews from Amazon.com, and the Amazon@Books dataset is a collection of product reviews from Amazon.com.
- Dataset
- JSON
The pushshift reddit dataset

The pushshift reddit dataset
- Dataset
- JSON
IMDB dataset

The IMDB dataset is a polarity dataset for sentiment analysis or text classification, it contains 50000 sentences and their binary class labels, being either "Positive" or...
- Dataset
- JSON
SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...
- Dataset
- JSON
Uniter dataset

The Uniter dataset is a multimodal learning dataset, which consists of images and corresponding text.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

103 datasets found