C4

The dataset used for pre-training language models, containing a large collection of text documents.

BibTex: