6 datasets found

Filter Results
  • SVAMP

    The SVAMP dataset contains natural language math problems from various sources, including textbooks and online resources.
  • MathVista

    MathVista is a benchmark for evaluating mathematical reasoning in visual contexts.
  • MATHCHECK

    MATHCHECK is a well-designed checklist for testing task generalization and reasoning robustness, along with an automatic tool for swiftly generating checklist for most math...
  • Qiyas Benchmark

    The Qiyas benchmark is a standardized General Aptitude Test (GAT) used for university admissions in Saudi Arabia, ensuring its quality and relevance to real-world assessment. It...
  • GSM8K

    Mathematical reasoning tasks involve mapping a question into a series of equations, which are then solved to obtain the final answer.
  • MathQA

    MathQA is an English mathematical problems dataset at GRE level. The original MathQA dataset is annotated in a different way from Math23k with many pre-defined operations.