Continual Pre-training - Groups - LDM

CMR Scaling Law

The dataset used in the paper is a mixture of general corpus and domain-specific corpus, with a power-law relationship between loss, mixture ratio, and training tokens scale.
- Dataset
- JSON

Before browse our site, please accept our cookies policy