-
Expository Writing Dataset
A dataset for expository writing tasks, including summarization, expert writing, and augmented writing. -
Filtered Spotify Podcast Dataset
The dataset after filtering consists of 90,055 episodes. -
Spotify Podcast Dataset
The Spotify Podcast Dataset consists of 105,360 episodes with transcripts and creator descriptions, and is provided as a training dataset for the summarization task. -
SummEval and Topical-Chat
This paper uses SummEval and Topical-Chat datasets for evaluating the quality of summaries and responses. -
Multi-News
The dataset used in the paper is a collection of 45K news articles and corresponding summaries, where each summary is professionally crafted and provides links to the original... -
Multi-XScience
The dataset used in the paper is a collection of summaries of longer texts, with human evaluators' ratings of existing summaries. -
AMI Meeting Corpus
The AMI Meeting Corpus was collected in three instrumented rooms with meeting conversations. Each room has two microphone arrays to collect 100 hours of far-field... -
CNN/DailyMail
A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said. -
Unified Multi-scenario Summarization Evaluation Model
UMSE is a unified multi-scenario summarization evaluation framework that can perform semantic evaluation on three typical evaluation scenarios: Sum-Ref, Sum-Doc, and Sum-Doc-Ref... -
Training a helpful and harmless assistant with reinforcement learning from hu...
The authors propose a novel approach that incorporates parameter-efficient tuning to better optimize control tokens, thus benefitting controllable generation. -
Big Patent Dataset
The Big Patent dataset is a large-scale dataset for abstractive and coherent summarization. -
Anthropic's HH-RLHF and OpenAI's summarization datasets
The dataset used in the paper is the Anthropic's HH-RLHF and OpenAI's summarization datasets.