18 datasets found

Tags: news articles

Filter Results
  • Media Frames Corpus

    A dataset of annotated news articles and social media posts for frame classification.
  • DUC-2001

    DUC-2001 is a keyphrase extraction dataset containing 309 news articles with 8 keyphrases per article.
  • ECB+ dataset

    The ECB+ dataset is an extended and re-annotated version of the ECB dataset, with new texts added about different event instances of the same event type.
  • ENTITIES

    A new dataset for timeline summarization of news articles, with more topics and longer time-ranges than previous datasets.
  • Multi-News

    The dataset used in the paper is a collection of 45K news articles and corresponding summaries, where each summary is professionally crafted and provides links to the original...
  • Global Database of Events, Language, and Tone

    The Global Database of Events, Language, and Tone (GDELT) dataset is a large collection of event records from news articles from 1979 to present. The dataset is used to model...
  • C4 dataset

    The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset.
  • Wall Street Journal

    The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees.
  • AutoCast++: Enhancing World Event Prediction with Zero-Shot Ranking-Based Con...

    The Autocast++ dataset is a benchmark for event forecasting using news articles.
  • DUC-2004

    DUC-2004 dataset is used for sentence summarization. It contains 500 documents, each with 4 model summaries.
  • NYT

    Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
  • CNN/DM

    Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
  • English Gigaword

    Text summarization aims to extract essential information from a piece of text and transform the text into a concise version.
  • News

    The News dataset consists of 5000 randomly sampled news articles from the NY Times corpus. It simulates the opinions of media consumers on news items. The units are different...
  • Cnews dataset

    The Cnews dataset is a collection of news articles from Sina News, filtered from 2005 to 2011. The dataset contains 10 categories of news, including sports, entertainment, home...
  • CNN/DailyMail

    A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said.
  • Disin dataset

    The Disin dataset is a fake news dataset on Kaggle, including 12,600 fake news articles and 12,600 truthful news articles.
  • AGNews Dataset

    The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).
You can also access this registry using the API (see API Docs).