23 datasets found

Tags: text summarization

Filter Results
  • Aggrefact-Unified dataset

    The Aggrefact-Unified dataset is a collection of news documents and summaries with factual errors.
  • Rotten Tomatoes

    The Rotten Tomatoes dataset has 5331 positive and 5331 negative review sentences.
  • Sentence Reduction for Automatic Text Summarization

    The dataset used in this paper for sentence reduction task.
  • TL;DR: Mining reddit to learn automatic summarization

    The authors used the TL;DR dataset, which consists of reddit posts with summaries.
  • Document Summarization Dataset

    The dataset used in the paper is a document summarization dataset. The goal is to extract sentences (with character budget B) to maximize coverage of human-annotated summaries.
  • Text Summarization

    The dataset used for the text summarization task, where a summarizer produces an utterance made up of one or multiple sentences to succinctly report the main content of a text.
  • WCEP

    Wikipedia Current Events Portal (WCEP) dataset, which consists of short, human-written summaries of news events, the articles for which are all extracted from the Wikipedia...
  • Multi-News

    The dataset used in the paper is a collection of 45K news articles and corresponding summaries, where each summary is professionally crafted and provides links to the original...
  • XSUM Dataset

    The XSUM dataset comprises 226,711 British Broadcasting Corporation (BBC) articles paired with their single-sentence summaries.
  • DUC-2004

    DUC-2004 dataset is used for sentence summarization. It contains 500 documents, each with 4 model summaries.
  • NYT

    Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
  • CNN/DM

    Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
  • English Gigaword

    Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
  • Smart Reply and Ambient Clinical Intelligence

    The dataset used for Smart Reply and Ambient Clinical Intelligence tasks
  • SAMSum

    The SAMSum dataset is a benchmark for automatic summarization evaluation, containing dialogue summaries and their associated reference summaries.
  • CLCV

    The CLCV dataset is used for evaluation.
  • ARXIV

    The ARXIV dataset is used for evaluation.
  • DUC 2007

    The DUC 2007 dataset is used for evaluation.
  • XSUM

    The XSUM dataset is used for training and evaluation.
  • DeFacto

    The DeFacto dataset is a resource specifically curated to enhance the factual consistency of machine-generated summaries through the inclusion of human-annotated demonstrations...
You can also access this registry using the API (see API Docs).