arXMLiv 2018

The arXMLiv 2018 dataset is an HTML collection of the arXiv.org preprint archive, used as a training corpus for word embedding techniques.

Data and Resources

Cite this as

André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp (2024). Dataset: arXMLiv 2018. https://doi.org/10.57702/zauxa58e

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.1905.08359
Author André Greiner-Petter
More Authors
Terry Ruas
Moritz Schubotz
Akiko Aizawa
William Grosky
Bela Gipp
Homepage https://arxiv.org/