You're currently viewing an old version of this dataset. To see the current version, click here.

Finance and Academic Papers

The dataset used in the paper is a mixture of general corpus and domain-specific corpus, with a power-law relationship between loss, mixture ratio, and training tokens scale.

Data and Resources

This dataset has no data

Cite this as

Jiawei Gu, Zacc Yang, Chuanghao Ding, Rui Zhao, Fei Tan (2025). Dataset: Finance and Academic Papers. https://doi.org/10.57702/t60tevy0

Private DOI This DOI is not yet resolvable.
It is available for use in manuscripts, and will be published when the Dataset is made public.

Additional Info

Field Value
Created January 2, 2025
Last update January 2, 2025
Defined In https://doi.org/10.48550/arXiv.2407.17467
Author Jiawei Gu
More Authors
Zacc Yang
Chuanghao Ding
Rui Zhao
Fei Tan