1 dataset found

Formats: JSON Tags: Pretraining Data

Filter Results
  • RedPajama

    The RedPajama dataset is an open-source recipe to reproduce the LLaMA training dataset.
You can also access this registry using the API (see API Docs).