You're currently viewing an old version of this dataset. To see the current version, click here.

BookCorpus

The dataset used in this paper for unsupervised sentence representation learning, consisting of paragraphs from unlabeled text.

Data and Resources

Cite this as

Shuai Tang, Hailin Jin, Chen Fang, Zhaowen Wang, Virginia R. de Sa (2024). Dataset: BookCorpus. https://doi.org/10.57702/wgy6lj2h

DOI retrieved: November 25, 2024

Additional Info

Field Value
Created November 25, 2024
Last update December 2, 2024
Defined In https://doi.org/10.48550/arXiv.2305.18239
Citation
  • https://doi.org/10.48550/arXiv.1806.04480
  • https://doi.org/10.48550/arXiv.2407.18698
  • https://doi.org/10.48550/arXiv.2206.08919
  • https://doi.org/10.48550/arXiv.1706.03146
  • https://doi.org/10.48550/arXiv.1705.00557
Author Shuai Tang
More Authors
Hailin Jin
Chen Fang
Zhaowen Wang
Virginia R. de Sa
Homepage https://books.nlp.stanford.edu/datasets/bookcorpus.html