3 datasets found

Formats: JSON Tags: Web Pages

Filter Results
  • WebKB

    The dataset used in this paper is a probabilistic logic programming dataset, which is a probabilistic version of the WebKB dataset.
  • CommonCrawl

    CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes.
  • Common Crawl

    The Common Crawl (CC) project browses and indexes all content available online. It generates 200-300 TiB of data per month (around 5% of which is in French), and constitutes the...
You can also access this registry using the API (see API Docs).