The Pile dataset

The Pile dataset is a large-scale dataset containing 800GB of text data.

Data and Resources

Cite this as

Gao et al. (2024). Dataset: The Pile dataset. https://doi.org/10.57702/5monum7u

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2307.04401
Author Gao et al.
Homepage https://arxiv.org/abs/2101.00027