You're currently viewing an old version of this dataset. To see the current version, click here.

English Wikipedia Dataset

The dataset consists of English Wikipedia articles used to train word vector models, containing 5.3M articles, 83M sentences, and 1,676M tokens.

Data and Resources

Cite this as

Sungjoon Park, JinYeong Bak, Alice Oh (2024). Dataset: English Wikipedia Dataset. https://doi.org/10.57702/hel2bi07

DOI retrieved: November 25, 2024

Additional Info

Field Value
Created November 25, 2024
Last update November 25, 2024
Defined In https://doi.org/10.18653/v1/D17-1041
Author Sungjoon Park
More Authors
JinYeong Bak
Alice Oh
Homepage https://dumps.wikimedia.org/enwiki/20170120/