arXMLiv 2018

doi:doi:10.57702/zauxa58e

arXMLiv 2018

The arXMLiv 2018 dataset is an HTML collection of the arXiv.org preprint archive, used as a training corpus for word embedding techniques.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

André Greiner-Petter, Terry Ruas, Moritz Schubotz, Akiko Aizawa, William Grosky, Bela Gipp (2024). Dataset: arXMLiv 2018. https://doi.org/10.57702/zauxa58e

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.1905.08359
Author	André Greiner-Petter
More Authors	Terry Ruas Moritz Schubotz Akiko Aizawa William Grosky Bela Gipp
Homepage	https://arxiv.org/