You're currently viewing an old version of this dataset. To see the current version, click here.

WikiText-103

WikiText-103 is a dataset containing over 100 million tokens with a vocabulary of about 200K words, where the sentences are consecutive and allow models to condition on larger contexts rather than single sentences.

Data and Resources

Cite this as

S. Merity, C. Xiong, J. Bradbury, R. Socher (2024). Dataset: WikiText-103. https://doi.org/10.57702/b35ezmet

DOI retrieved: November 25, 2024

Additional Info

Field Value
Created November 25, 2024
Last update November 25, 2024
Defined In https://doi.org/10.48550/arXiv.1612.08083
Citation
  • https://doi.org/10.48550/arXiv.2004.14996
  • https://doi.org/10.48550/arXiv.1812.10860
Author S. Merity
More Authors
C. Xiong
J. Bradbury
R. Socher
Homepage https://arxiv.org/abs/1609.07843