Datasets Activity Stream About Order by Relevance Name Ascending Name Descending Last Modified Go 2 datasets found Tags: web scraping Filter Results CommonCrawl CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes. Dataset JSON LAION The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model. Dataset JSON