CommonCrawl

CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes.

BibTex: