-
XSUM Dataset
The XSUM dataset comprises 226,711 British Broadcasting Corporation (BBC) articles paired with their single-sentence summaries. -
DUC2002, DUC2003, DUC2005 datasets
Multi-document summarization datasets -
Wikibio Dataset
Text summarization and data-to-text generation datasets -
Gigaword and New York Times Annotated Corpus
Text summarization and data-to-text generation datasets -
Towards a unified multi-dimensional evaluator for text generation
The NewsRoom dataset consists of 60 input source texts and 7 output summaries for each sample. -
TEDLIUM Corpus
The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization. -
English Gigaword
Text summarization aims to extract essential information from a piece of text and transform the text into a concise version. -
TED: A Pretrained Unsupervised Summarization Model
Text summarization aims to extract essential information from a piece of text and transform the text into a concise version. -
ELI5, FinanceQA, MultiNews, and QMSum datasets
The ELI5, FinanceQA, MultiNews, and QMSum datasets were used in the paper. -
ROCStories
The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations. -
SPACE and AMAZON datasets
The SPACE dataset contains hotel reviews, and the AMAZON dataset contains product reviews.