Clean Corpus

The clean corpus contains a web scrape of 1.2 million reddit threads from 1,697 top subreddits.

Data and Resources

Cite this as

Liao, Yuan, Wang, Xing (2024). Dataset: Clean Corpus. https://doi.org/10.57702/8tf73u9s

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2011.03011
Author Liao
More Authors
Yuan
Wang
Xing