1 dataset found

Tags: HTML

Filter Results
  • arXMLiv 2018

    The arXMLiv 2018 dataset is an HTML collection of the arXiv.org preprint archive, used as a training corpus for word embedding techniques.
You can also access this registry using the API (see API Docs).