9 datasets found

Filter Results
  • COMET

    COMET is a model for commonsense reasoning that can generate coherent and contextually relevant text.
  • CSQA

    The CSQA dataset is a widely used benchmark dataset for conversational KBQA, consisting of around 200K dialogues where training set, validation set and testing set contain 153K,...
  • StrategyQA

    The StrategyQA dataset is used to evaluate the ability of LLMs in generating accurate answers to multi-step reasoning questions.
  • ROCStories (+GPT-J)

    A corpus and cloze evaluation for deeper understanding of commonsense stories.
  • ROCStories

    The ROCStories corpus is a collection of crowdsourced five-sentence everyday stories rich in causal and temporal relations.
  • A Corpus and Cloze Evaluation for Deeper Understanding of Commonsense Stories

    A corpus and cloze evaluation for deeper understanding of commonsense stories.
  • CommonGen

    Commonsense generation aims to generate a realistic sentence describing a daily scene under the given concepts, which is very challenging, since it requires models to have...
  • CommonsenseQA-3k

    The dataset used in the paper is also mentioned as CommonsenseQA-3k, which is a 3,903 example dataset for commonsense reasoning.
  • GLUCOSE

    GLUCOSE is a large-scale dataset of implicit commonsense knowledge, encoded as causal mini-theories about the world, each grounded in a narrative context.