Text Summarization - Groups

XSUM Dataset

The XSUM dataset comprises 226,711 British Broadcasting Corporation (BBC) articles paired with their single-sentence summaries.
- Dataset
- JSON
DUC2002, DUC2003, DUC2005 datasets

Multi-document summarization datasets
- Dataset
- JSON
Wikibio Dataset

Text summarization and data-to-text generation datasets
- Dataset
- JSON
Gigaword and New York Times Annotated Corpus

Text summarization and data-to-text generation datasets
- Dataset
- JSON
Towards a unified multi-dimensional evaluator for text generation

The NewsRoom dataset consists of 60 input source texts and 7 output summaries for each sample.
- Dataset
- JSON
DUC-2004

DUC-2004 dataset is used for sentence summarization. It contains 500 documents, each with 4 model summaries.
- Dataset
- JSON
TEDLIUM Corpus

The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization.
- Dataset
- JSON
NYT

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
- Dataset
- JSON
CNN/DM

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
- Dataset
- JSON
English Gigaword

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
- Dataset
- JSON
TED: A Pretrained Unsupervised Summarization Model

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
- Dataset
- JSON
SAMSum

The SAMSum dataset is a benchmark for automatic summarization evaluation, containing dialogue summaries and their associated reference summaries.
- Dataset
- JSON
CLCV

The CLCV dataset is used for evaluation.
- Dataset
- JSON
ARXIV

The ARXIV dataset is used for evaluation.
- Dataset
- JSON
DUC 2007

The DUC 2007 dataset is used for evaluation.
- Dataset
- JSON
XSUM

The XSUM dataset is used for training and evaluation.
- Dataset
- JSON
ELI5, FinanceQA, MultiNews, and QMSum datasets

The ELI5, FinanceQA, MultiNews, and QMSum datasets were used in the paper.
- Dataset
- JSON
DeFacto

The DeFacto dataset is a resource specifically curated to enhance the factual consistency of machine-generated summaries through the inclusion of human-annotated demonstrations...
- Dataset
- JSON
ROCStories

The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations.
- Dataset
- JSON
SPACE and AMAZON datasets

The SPACE dataset contains hotel reviews, and the AMAZON dataset contains product reviews.
- Dataset
- JSON

47 datasets found