Text Summarization - Groups

Aggrefact-Unified dataset

The Aggrefact-Unified dataset is a collection of news documents and summaries with factual errors.
- Dataset
- JSON
IgboSum1500

IgboSum1500 is an Igbo text summarization dataset, housing 1,500 articles.
- Dataset
- JSON
Rotten Tomatoes

The Rotten Tomatoes dataset has 5331 positive and 5331 negative review sentences.
- Dataset
- JSON
Sentence Reduction for Automatic Text Summarization

The dataset used in this paper for sentence reduction task.
- Dataset
- JSON
TL;DR: Mining reddit to learn automatic summarization

The authors used the TL;DR dataset, which consists of reddit posts with summaries.
- Dataset
- JSON
Famous Keyword Twitter Replies

The Famous Keyword Twitter Replies dataset is a comprehensive collection of Twitter data that focuses on popular keywords and their associated replies.
- Dataset
- JSON
Document Summarization Dataset

The dataset used in the paper is a document summarization dataset. The goal is to extract sentences (with character budget B) to maximize coverage of human-annotated summaries.
- Dataset
- JSON
Text Summarization

The dataset used for the text summarization task, where a summarizer produces an utterance made up of one or multiple sentences to succinctly report the main content of a text.
- Dataset
- JSON
Se2: Sequential Example Selection for In-Context Learning

The paper proposes a novel approach to the sequential example selection paradigm for in-context learning.
- Dataset
- JSON
WCEP

Wikipedia Current Events Portal (WCEP) dataset, which consists of short, human-written summaries of news events, the articles for which are all extracted from the Wikipedia...
- Dataset
- JSON
Multi-News

The dataset used in the paper is a collection of 45K news articles and corresponding summaries, where each summary is professionally crafted and provides links to the original...
- Dataset
- JSON
TAC’08 and TAC’09 datasets

Two multi-document summarization datasets from the Text Analysis Conference (TAC) shared tasks: TAC’08 and TAC’09.
- Dataset
- JSON
XSUM Dataset

The XSUM dataset comprises 226,711 British Broadcasting Corporation (BBC) articles paired with their single-sentence summaries.
- Dataset
- JSON
DUC2002, DUC2003, DUC2005 datasets

Multi-document summarization datasets
- Dataset
- JSON
Wikibio Dataset

Text summarization and data-to-text generation datasets
- Dataset
- JSON
Gigaword and New York Times Annotated Corpus

Text summarization and data-to-text generation datasets
- Dataset
- JSON
Towards a unified multi-dimensional evaluator for text generation

The NewsRoom dataset consists of 60 input source texts and 7 output summaries for each sample.
- Dataset
- JSON
DUC-2004

DUC-2004 dataset is used for sentence summarization. It contains 500 documents, each with 4 model summaries.
- Dataset
- JSON
TEDLIUM Corpus

The TEDLIUM corpus is a large-volume corpus used for speech recognition and text summarization.
- Dataset
- JSON
NYT

Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
- Dataset
- JSON

39 datasets found