119 datasets found

Tags: Question Answering

Filter Results
  • Visual Question Answering (VQA)

    The VQA dataset consists of 248,349 training questions, 121,512 validation questions and 244,302 testing questions, generated on a total of 123,287 images.
  • NLPbench

    The dataset is used for evaluating large language models on solving NLP problems.
  • Natural Questions: A Benchmark for Question Answering Research

    A benchmark for question answering research is introduced, which includes a large dataset of natural questions.
  • Causal-VidQA

    This dataset is used in the paper to evaluate the performance of the TranSTR architecture.
  • ActivityNet-QA

    Video question answering (VideoQA) is an essential task in vision-language understanding, which has attracted numerous research attention recently. Nevertheless, existing works...
  • Simple Question dataset

    The dataset used in this paper is a set of categorical probability distributions for a finite set of categories A = {a1,..., ak}. The dataset is used to evaluate the proposed...
  • ProKnow-data

    The ProKnow-data dataset is a collection of diagnostic conversations guided by safety constraints and ProKnow that healthcare professionals use.
  • QASC

    Explanation Gold Standards (XGSs) are emerging as a fundamental enabling tool for step-wise and explainable Natural Language Inference (NLI).
  • FB15K

    Knowledge graphs (KGs) such as Freebase (Bollacker et al. 2008), DBpedia (Auer et al. 2007), and YAGO (Mahdisoltani, Biega, and Suchanek 2014) play a critical role in various...
  • NarrativeQA

    The NarrativeQA dataset is a reading comprehension challenge that focuses on questions with a single entity and relation.
  • Legal Document Chatbot

    A legal document chatbot developed using Langchain and Flask, capable of answering questions within the context of Indian constitution.
  • Stack Overflow dataset

    The Stack Overflow dataset contains data from a question-answering forum on the topic of computer programming.
  • Hellaswag: Can a machine really finish your sentence?

    Hellaswag: Can a machine really finish your sentence?
  • SuperGLUE

    The dataset used in the paper is the SuperGLUE benchmark, which includes 17 tasks: STS-B, MRPC, MNLI, QNL, QNLI, CoLA, SST-2, MRPC, GLUE, NLI, NQ, ReCoRD, ReCoRD-Sub,...
  • LLM dataset

    The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...
  • CLEVR-Humans

    The CLEVR-Humans dataset consists of 32,164 questions asked by humans, containing words and reasoning steps that were unseen in CLEVR.
  • CLOSURE

    The CLOSURE dataset consists of 25,200 questions with identical vocabulary but different structure than CLEVR, asked on the same set of images.
  • MMLU dataset

    The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),...
  • LLaVA 158k

    The LLaVA 158k dataset is a large-scale multimodal learning dataset, which is used for training and testing multimodal large language models.
  • Multimodal Robustness Benchmark

    The MMR benchmark is designed to evaluate MLLMs' comprehension of visual content and robustness against misleading questions, ensuring models truly leverage multimodal inputs...