-
Yahoo Answer and Yelp15 review
Two large scale document classification datasets: Yahoo Answer and Yelp15 review, representing topic classification and sentiment classification data sets respectively. -
CommonCrawl
CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes.