33 datasets found

Tags: News Articles

Filter Results
  • CNN-DM Dataset

    The CNN-DM dataset contains news articles and is used for training language models.
  • AP News Corpus

    The AP News corpus contains professionally-edited news articles and its vocabulary plateaus much faster than the Amazon corpus.
  • AG News Dataset

    The AG News - News articles from over 2000 news sources annotated by type of news: Sports, World, Business, and Science/Tech. 120k training and 7k test sets are provided.
  • CNN/DailyMail and XSum

    The CNN/DailyMail dataset is a collection of news articles, and the XSum dataset is a collection of news articles with summaries.
  • AGNews

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used a variety of datasets for semi-supervised learning tasks.
  • LOCO dataset

    The LOCO dataset consists of a large number of documents collected from 58 conspiracy theories media sources and 92 mainstream media sources.
  • Bing-News

    Bing-News is a dataset containing 1,025,192 pieces of implicit feedback collected from the server logs of Bing News.
  • ReCOVery

    This dataset contains news information verified by domain experts (labeled as true or fake) and how the news spreads on Twitter.
  • CoAID

    A dataset of COVID-19 misinformation detection, focusing on healthcare misinformation.
  • BFRS Dataset

    The BFRS dataset contains news stories from Pakistan with labels for various categories related to political violence.
  • Crowd Counting Consortium

    The Crowd Counting Consortium dataset contains news stories from Pakistan with labels for various categories.
  • CORD-19

    The CORD-19 dataset contains academic journal articles relating to a variety of coronaviruses and related viral infections, not only COVID-19, sourced from PubMed Central (PMC),...
  • GDPR Media Discourse

    The dataset contains news articles from French, German, UK, and US sources about GDPR media discourse.
  • News-26

    The dataset used in the news classification task, containing news articles with their corresponding labels.
  • MIND-15

    The dataset used in the news classification task, containing news articles with their corresponding labels.
  • Berita Dataset

    The Berita dataset consists of 50304 digital Indonesia news articles shared online through Twitter.
  • AG's News Corpus

    AG's News Corpus
  • DUC-2004

    DUC-2004 dataset is used for sentence summarization. It contains 500 documents, each with 4 model summaries.
  • Reuters Dataset

    The Reuters dataset is a text classification dataset containing 21,578 samples.
  • Reuters-8

    The Reuters-8 dataset is a collection of news articles from Reuters.
You can also access this registry using the API (see API Docs).