MATHCHECK is a well-designed checklist for testing task generalization and reasoning robustness, along with an automatic tool for swiftly generating checklist for most math...
The Qiyas benchmark is a standardized General Aptitude Test (GAT) used for university admissions in Saudi Arabia, ensuring its quality and relevance to real-world assessment. It...
MathQA is an English mathematical problems dataset at GRE level. The original MathQA dataset is annotated in a different way from Math23k with many pre-defined operations.