You're currently viewing an old version of this dataset. To see the current version, click here.

arXMLiv 2018

The arXMLiv 2018 dataset is an HTML collection of the arXiv.org preprint archive, used as a training corpus for word embedding techniques.

Data and Resources

This dataset has no data

Cite this as

André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp (2024). Dataset: arXMLiv 2018. https://doi.org/10.57702/zauxa58e

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1905.08359
Author André Greiner-Petter
More Authors
Terry Ruas
Moritz Schubotz
Akiko Aizawa
William Grosky
Bela Gipp
Homepage https://arxiv.org/