Dataset - LDM

C4

The dataset used for pre-training language models, containing a large collection of text documents.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

Before browse our site, please accept our cookies policy