Wikipedia Detox project
The dataset used in the paper is a collection of 100,000 Wikipedia talk page comments manually labelled by workers on the Crowdflower platform for 'toxicity'. -
From Detection of Toxic Spans in Online Discussions to Analysis of Toxic-to-C...
The ToxicSpans dataset is a subset of the Civil Comments dataset, containing toxic spans. -
RealToxicityPrompts constitutes a collection of 100k naturally occurring sentences, amassed from various internet sources and designed to function as LM prompts.