5 datasets found

Formats: JSON Tags: commonsense reasoning

Filter Results
  • COMET

    COMET is a model for commonsense reasoning that can generate coherent and contextually relevant text.
  • CommonsenseQA and OpenBookQA

    CommonsenseQA and OpenBookQA are two of the most widely used commonsense reasoning benchmarks.
  • CSQA

    The CSQA dataset is a widely used benchmark dataset for conversational KBQA, consisting of around 200K dialogues where training set, validation set and testing set contain 153K,...
  • Jericho

    A dataset of 32 interactive fiction games, including dungeon crawl, Sci-Fi, mystery, comedy, and horror games.
  • StrategyQA

    The StrategyQA dataset is used to evaluate the ability of LLMs in generating accurate answers to multi-step reasoning questions.
You can also access this registry using the API (see API Docs).