-
CommonsenseQA and OpenBookQA
CommonsenseQA and OpenBookQA are two of the most widely used commonsense reasoning benchmarks. -
StrategyQA
The StrategyQA dataset is used to evaluate the ability of LLMs in generating accurate answers to multi-step reasoning questions.