-
Proof-Pile-2
The dataset used for continual pre-training of large language models, with a focus on balancing the text distribution and mitigating overfitting. -
DeepMind Mathematics Dataset
The DeepMind Mathematics Dataset consists of synthetically generated math problems. They cover a range of problem types including: Numbers, comparison, measurement, arithmetic,... -
HOL Light and Flyspeck corpora
The dataset consists of the core HOL Light corpus and the Flyspeck corpus, with millions of nodes representing atomic inferences. -
COVID-19 dataset
The dataset used in the paper is COVID-19 case data, state restriction policy, population and density, population with higher risk, age structure data, race structure data, and... -
GeoQA and GeoQA+
Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand... -
Math Dataset
The Math dataset is collected from the widely-used online learning system Zhixue1, which contains mathematical exercises and logs of high school examinations.