Dataset - LDM

CODE-NN

A dataset for automatically generating summary comments for Java methods.
- Dataset
- JSON
Expository Writing Dataset

A dataset for expository writing tasks, including summarization, expert writing, and augmented writing.
- Dataset
- JSON
Filtered Spotify Podcast Dataset

The dataset after filtering consists of 90,055 episodes.
- Dataset
- JSON
Spotify Podcast Dataset

The Spotify Podcast Dataset consists of 105,360 episodes with transcripts and creator descriptions, and is provided as a training dataset for the summarization task.
- Dataset
- JSON
SummEval and Topical-Chat

This paper uses SummEval and Topical-Chat datasets for evaluating the quality of summaries and responses.
- Dataset
- JSON
Multi-News

The dataset used in the paper is a collection of 45K news articles and corresponding summaries, where each summary is professionally crafted and provides links to the original...
- Dataset
- JSON
Multi-XScience

The dataset used in the paper is a collection of summaries of longer texts, with human evaluators' ratings of existing summaries.
- Dataset
- JSON
SCITDLR

The dataset used in the paper is a collection of summaries of longer texts, with human evaluators' ratings of existing summaries.
- Dataset
- JSON
CNN/DM

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
- Dataset
- JSON
AMI Meeting Corpus

The AMI Meeting Corpus was collected in three instrumented rooms with meeting conversations. Each room has two microphone arrays to collect 100 hours of far-field...
- Dataset
- JSON
SAMSum

The SAMSum dataset is a benchmark for automatic summarization evaluation, containing dialogue summaries and their associated reference summaries.
- Dataset
- JSON
CNN/DailyMail

A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said.
- Dataset
- JSON
Unified Multi-scenario Summarization Evaluation Model

UMSE is a unified multi-scenario summarization evaluation framework that can perform semantic evaluation on three typical evaluation scenarios: Sum-Ref, Sum-Doc, and Sum-Doc-Ref...
- Dataset
- JSON
Training a helpful and harmless assistant with reinforcement learning from hu...

The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation.
- Dataset
- JSON
Big Patent Dataset

The Big Patent dataset is a large-scale dataset for abstractive and coherent summarization.
- Dataset
- JSON
Anthropic's HH-RLHF and OpenAI's summarization datasets

The dataset used in the paper is the Anthropic's HH-RLHF and OpenAI's summarization datasets.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

16 datasets found