You're currently viewing an old version of this dataset. To see the current version, click here.

C4 dataset

The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset.

Data and Resources

Cite this as

Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu (2024). Dataset: C4 dataset. https://doi.org/10.57702/bsbjlzeg

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2312.17295
Citation
  • https://doi.org/10.1145/3649329.3658498
  • https://doi.org/10.48550/arXiv.2203.06211
Author Ziyi Guan
More Authors
Hantao Huang
Yupeng Su
Hong Huang
Ngai Wong
Hao Yu
Homepage https://huggingface.co/datasets/c4