-
LOCO dataset
The LOCO dataset consists of a large number of documents collected from 58 conspiracy theories media sources and 92 mainstream media sources. -
BFRS Dataset
The BFRS dataset contains news stories from Pakistan with labels for various categories related to political violence. -
Crowd Counting Consortium
The Crowd Counting Consortium dataset contains news stories from Pakistan with labels for various categories. -
GDPR Media Discourse
The dataset contains news articles from French, German, UK, and US sources about GDPR media discourse. -
Berita Dataset
The Berita dataset consists of 50304 digital Indonesia news articles shared online through Twitter. -
AG's News Corpus
AG's News Corpus -
Reuters Dataset
The Reuters dataset is a text classification dataset containing 21,578 samples. -
RCV1 Dataset
The RCV1 dataset is a corpus of Reuters news articles. -
CNN/DailyMail
A bus driver who was seriously injured when he was hit by a steam engine is making good progress, his wife has said. -
CNN-DailyMail Summarization Dataset
The dataset contains 280K news articles for abstractive summarization.