1 dataset found

Tags: task generalization

Filter Results
  • MATHCHECK

    MATHCHECK is a well-designed checklist for testing task generalization and reasoning robustness, along with an automatic tool for swiftly generating checklist for most math...
You can also access this registry using the API (see API Docs).