You're currently viewing an old version of this dataset. To see the current version, click here.

Distributed representations of words and phrases and their compositionality

The word2vec dataset is a word embedding dataset that contains 3 million words.

Data and Resources

Cite this as

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, Jeff Dean (2025). Dataset: Distributed representations of words and phrases and their compositionality. https://doi.org/10.57702/kcdhx0zi

DOI retrieved: January 2, 2025

Additional Info

Field Value
Created January 2, 2025
Last update January 2, 2025
Defined In https://doi.org/10.48550/arXiv.2010.10813
Citation
  • https://doi.org/10.48550/arXiv.1709.00947
  • https://doi.org/10.48550/arXiv.2011.00318
Author Tomas Mikolov
More Authors
Ilya Sutskever
Kai Chen
Greg S Corrado
Jeff Dean
Homepage https://code.google.com/archive/p/word2vec/