C4 dataset

The dataset used in the paper is not explicitly mentioned, but it is mentioned that the authors trained a GPT2 transformer language model on the C4 dataset.

Data and Resources

Cite this as

Ziyi Guan, Hantao Huang, Yupeng Su, Hong Huang, Ngai Wong, Hao Yu (2024). Dataset: C4 dataset. https://doi.org/10.57702/bsbjlzeg

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2312.17295
Citation
  • https://doi.org/10.1145/3649329.3658498
  • https://doi.org/10.48550/arXiv.2203.06211
Author Ziyi Guan
More Authors
Hantao Huang
Yupeng Su
Hong Huang
Ngai Wong
Hao Yu
Homepage https://huggingface.co/datasets/c4