-
MWPD-PC dataset
The dataset is a collection of product offers crawled from the web, annotated with schema.org vocabulary. -
Rakuten dataset
The dataset is a collection of product offers crawled from the web, annotated with schema.org vocabulary. -
IceCat dataset
The dataset is a collection of product offers crawled from the web, annotated with schema.org vocabulary. -
WDC-25 dataset
The dataset is a collection of product offers crawled from the web, annotated with schema.org vocabulary. -
German Common Crawl
German Common Crawl is a dataset of web pages crawled from the internet. -
CommonCrawl
CommonCrawl is a non-profit organization that provides a large corpus of web pages for research and development purposes.