29 datasets found

Formats: JSON

Filter Results
  • Proof-Pile-2

    The dataset used for continual pre-training of large language models, with a focus on balancing the text distribution and mitigating overfitting.
  • DeepMind Mathematics Dataset

    The DeepMind Mathematics Dataset consists of synthetically generated math problems. They cover a range of problem types including: Numbers, comparison, measurement, arithmetic,...
  • HOL Light and Flyspeck corpora

    The dataset consists of the core HOL Light corpus and the Flyspeck corpus, with millions of nodes representing atomic inferences.
  • COVID-19 dataset

    The dataset used in the paper is COVID-19 case data, state restriction policy, population and density, population with higher risk, age structure data, race structure data, and...
  • Math23k

    Math23k is the most commonly used Chinese dataset in MWP solving. It contains 23,162 problems with 21,162 training problems, 1,000 validation problems and 1,000 testing problems.
  • Eedi2020

    The dataset used in the NeurIPS 2020 Education Challenge, which contains students' answers to mathematics questions from Eedi.
  • GeoQA and GeoQA+

    Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand...
  • Math Dataset

    The Math dataset is collected from the widely-used online learning system Zhixue1, which contains mathematical exercises and logs of high school examinations.
  • MathQA

    MathQA is an English mathematical problems dataset at GRE level. The original MathQA dataset is annotated in a different way from Math23k with many pre-defined operations.