-
Media Frames Corpus
A dataset of annotated news articles and social media posts for frame classification. -
ECB+ dataset
The ECB+ dataset is an extended and re-annotated version of the ECB dataset, with new texts added about different event instances of the same event type. -
Multi-News
The dataset used in the paper is a collection of 45K news articles and corresponding summaries, where each summary is professionally crafted and provides links to the original... -
Global Database of Events, Language, and Tone
The Global Database of Events, Language, and Tone (GDELT) dataset is a large collection of event records from news articles from 1979 to present. The dataset is used to model... -
C4 dataset
The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset. -
Wall Street Journal
The Wall Street Journal dataset is used for syntactic linearization. It contains a large corpus of news articles with their corresponding syntactic trees. -
AutoCast++: Enhancing World Event Prediction with Zero-Shot Ranking-Based Con...
The Autocast++ dataset is a benchmark for event forecasting using news articles. -
English Gigaword
Text summarization aims to extract essential information from a piece of text and transform the text into a concise version. -
Cnews dataset
The Cnews dataset is a collection of news articles from Sina News, filtered from 2005 to 2011. The dataset contains 10 categories of news, including sports, entertainment, home... -
CNN/DailyMail
A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said. -
Disin dataset
The Disin dataset is a fake news dataset on Kaggle, including 12,600 fake news articles and 12,600 truthful news articles. -
AGNews Dataset
The AGNews dataset is a collection of news articles, where each article is labeled with a topic (e.g. politics, sports, etc.).