Distributed representations of words and phrases and their compositionality

doi:doi:10.57702/kcdhx0zi

You're currently viewing an old version of this dataset. To see the current version, click here.

Distributed representations of words and phrases and their compositionality

The word2vec dataset is a word embedding dataset that contains 3 million words.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, Jeff Dean (2025). Dataset: Distributed representations of words and phrases and their compositionality. https://doi.org/10.57702/kcdhx0zi

DOI retrieved: January 2, 2025

Additional Info

Field	Value
Created	January 2, 2025
Last update	January 2, 2025
Defined In	https://doi.org/10.48550/arXiv.2010.10813
Citation	https://doi.org/10.48550/arXiv.1709.00947 https://doi.org/10.48550/arXiv.2011.00318
Author	Tomas Mikolov
More Authors	Ilya Sutskever Kai Chen Greg S Corrado Jeff Dean
Homepage	https://code.google.com/archive/p/word2vec/