2 datasets found

Tags: web scraping

Filter Results
  • CommonCrawl

    CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes.
  • LAION

    The dataset used in the paper is not explicitly described, but it is mentioned that it is a large-scale captioned image dataset (LAION) used to train the Stable Diffusion model.