S2ORC

A collection of 81.1 million scholarly publications in English from various academic fields, used to pre-train a language model.

Data and Resources

Cite this as

Lo, K., Wang, L. L., Neumann, M., Kinney, R., Weld, D. S. (2024). Dataset: S2ORC. https://doi.org/10.57702/g2wuqc2w

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2212.03869
Citation
  • https://doi.org/10.48550/arXiv.2303.14334
  • https://doi.org/10.48550/arXiv.2307.12996
  • https://doi.org/10.48550/arXiv.2401.01089
  • https://doi.org/10.18653/v1/2023.emnlp-main.822
Author Lo, K.
More Authors
Wang, L. L.
Neumann, M.
Kinney, R.
Weld, D. S.
Homepage https://s2orc.org/