-
Wikipedia Detox project
The dataset used in the paper is a collection of 100,000 Wikipedia talk page comments manually labelled by workers on the Crowdflower platform for 'toxicity'. -
WMT 2020 Sentence-Level Direct Assessment dataset
The dataset used in the competition for Sentence-Level Direct Assessment shared task is composed of data extracted from Wikipedia for six language pairs, consisting of... -
French Wikipedia
French Wikipedia corpus -
Wikipedia as multilingual source of comparable corpora
Wikipedia as multilingual source of comparable corpora. -
Wikipedia4
Electricity2, Traffic3, and Wikipedia4, preprocessed exactly as in (Salinas et al., 2019a), with their properties listed in Table 3. -
Electricity2, Traffic3, and Wikipedia4
Electricity2, Traffic3, and Wikipedia4, preprocessed exactly as in (Salinas et al., 2019a), with their properties listed in Table 3. -
Wikipedia Dispute Corpus
A newly created corpus of discussions from Wikipedia Talk pages for dispute detection -
Authority and Alignment in Wikipedia Discussions (AAWD)
A newly created corpus of Wikipedia Talk pages for dispute detection -
Local and global algorithms for disambiguation to Wikipedia
Local and global algorithms for disambiguation to Wikipedia. -
Fast and accurate annotation of short texts with Wikipedia pages
Fast and accurate annotation of short texts with Wikipedia pages. -
FEVER: A Large-Scale Dataset for Fact Extraction and Verification
The FEVER dataset consists of 185,455 annotated claims, together with 5,416,537 Wikipedia documents containing roughly 25 million sentences as potential evidence. -
Bias in Bios Dataset
Bias in Bios dataset, a personal biography dataset with information extracted from Wikipedia. -
UMDWikipedia dataset
UMDWikipedia dataset contains information of around 770K edits from Jan 2013 to July 2014 (19 months) with 17105 vandals and 17105 benign users. -
Wikipedia Corpus
The dataset used in the paper is a subset of the Wikipedia corpus, consisting of 7500 English Wikipedia articles belonging to one of the following categories: People, Cities,...