WikiText-103

doi:doi:10.57702/b35ezmet

You're currently viewing an old version of this dataset. To see the current version, click here.

WikiText-103

WikiText-103 is a dataset containing over 100 million tokens with a vocabulary of about 200K words, where the sentences are consecutive and allow models to condition on larger contexts rather than single sentences.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

S. Merity, C. Xiong, J. Bradbury, R. Socher (2024). Dataset: WikiText-103. https://doi.org/10.57702/b35ezmet

DOI retrieved: November 25, 2024

Additional Info

Field	Value
Created	November 25, 2024
Last update	November 25, 2024
Defined In	https://doi.org/10.48550/arXiv.1612.08083
Citation	https://doi.org/10.48550/arXiv.2004.14996 https://doi.org/10.48550/arXiv.1812.10860
Author	S. Merity
More Authors	C. Xiong J. Bradbury R. Socher
Homepage	https://arxiv.org/abs/1609.07843