-
Gigaword sentence dataset
Gigaword sentence dataset is a large corpus of sentences. -
AfricanHLT 2010
The dataset used for the automatic text summarization task, containing documents in three languages. -
TREC-CAR Benchmark Y1
The dataset used for the Retrieve-Cluster-Summarize system, consisting of 117 article-level queries and 126 test queries. -
RAGSummData
The dataset used in the paper is a collection of dialogues and prompts for training a model to perform retrieval-augmented generation (RAG) based summarization. The dataset is... -
CNN, XSum, Gigaword News Headline, and Annotated Enron Subject Line Corpus
The CNN, XSum, Gigaword News Headline, and Annotated Enron Subject Line Corpus are datasets used for various NLP tasks. -
CNN/DailyMail and XSum
The CNN/DailyMail dataset is a collection of news articles, and the XSum dataset is a collection of news articles with summaries. -
Aggrefact-Unified dataset
The Aggrefact-Unified dataset is a collection of news documents and summaries with factual errors. -
IgboSum1500
IgboSum1500 is an Igbo text summarization dataset, housing 1,500 articles. -
Rotten Tomatoes
The Rotten Tomatoes dataset has 5331 positive and 5331 negative review sentences. -
Sentence Reduction for Automatic Text Summarization
The dataset used in this paper for sentence reduction task. -
TL;DR: Mining reddit to learn automatic summarization
The authors used the TL;DR dataset, which consists of reddit posts with summaries. -
Famous Keyword Twitter Replies
The Famous Keyword Twitter Replies dataset is a comprehensive collection of Twitter data that focuses on popular keywords and their associated replies. -
Document Summarization Dataset
The dataset used in the paper is a document summarization dataset. The goal is to extract sentences (with character budget B) to maximize coverage of human-annotated summaries. -
Text Summarization
The dataset used for the text summarization task, where a summarizer produces an utterance made up of one or multiple sentences to succinctly report the main content of a text. -
Se2: Sequential Example Selection for In-Context Learning
The paper proposes a novel approach to the sequential example selection paradigm for in-context learning. -
Multi-News
The dataset used in the paper is a collection of 45K news articles and corresponding summaries, where each summary is professionally crafted and provides links to the original... -
TAC’08 and TAC’09 datasets
Two multi-document summarization datasets from the Text Analysis Conference (TAC) shared tasks: TAC’08 and TAC’09.