Skip to content
Log in
Register
Toggle navigation
Datasets
All
Imported
Services
Organizations
Groups
About
Demo
FedORKG
Search Datasets
Home
Datasets
Order by
Relevance
Name Ascending
Name Descending
Last Modified
Go
2 datasets found
Formats:
JSON
Tags:
HTML pages
Filter Results
CleanEval
CleanEval is the largest publicly available dataset for boilerplate removal.
Dataset
JSON
Web2Text: Deep Structured Boilerplate Removal
Web pages are a valuable source of information for many natural language processing and information retrieval tasks. Extracting the main content from those documents is...
Dataset
JSON
You can also access this registry using the
API
(see
API Docs
).
Before browse our site, please accept our
cookies policy
Accept and close this alert