2 datasets found

Formats: JSON Tags: text distribution

Filter Results
  • Proof-Pile-2

    The dataset used for continual pre-training of large language models, with a focus on balancing the text distribution and mitigating overfitting.
  • Open-Orca

    The dataset used for training large language models, with a focus on balancing the text distribution and mitigating overfitting.
You can also access this registry using the API (see API Docs).