119 datasets found

Tags: Question Answering

Filter Results
  • StockQA

    A large-scale dataset containing over 180K StockQA instances, built based on Chinese online stock forums.
  • YAGO3-10

    Knowledge graphs are composed of different elements: entity nodes, relation edges, and literal nodes. Each literal node contains an entity’s attribute value (e.g. the height of...
  • QQP

    The Quora Question Pairs (QQP) dataset consists of 50,000 question pairs labeled with paraphrase or non-paraphrase.
  • NYT and WebNLG

    NYT and WebNLG are widely used datasets for relational triple extraction.
  • VisualBERT

    The VisualBERT dataset is a pre-trained model for vision-and-language tasks, which is built on top of PyTorch.
  • Task Driven Image Understanding Challenge (TDIUC)

    The Task Driven Image Understanding Challenge (TDIUC) dataset is a large VQA dataset with 12 more fine-grained categories proposed to compensate for the bias in distribution of...
  • WebQA, CEval, CMMLU, and MMLU

    WebQA, CEval, CMMLU, and MMLU for general chat
  • VQA 1.0

    The VQA 1.0 dataset is a large-scale dataset for visual question answering, containing 15,000 images with 50,000 questions.
  • VQA

    The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.
  • SimpleQuestion

    The SimpleQuestion dataset is a dataset for question answering, consisting of 100,000 questions and 1,000,000 answers.
  • MSRVTT-QA

    Video question answering (VideoQA) requires systems to understand the visual information and infer an answer for a natural language question from it.
  • Multi-Image VQA for Unsupervised Anomaly Detection

    Unsupervised anomaly detection dataset for multi-image visual question answering
  • MedMCQA

    MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions.
  • OpenOrca dataset

    The dataset used for the Vectara hallucination task, containing OpenOrca questions.
  • QASiNa

    Question Answering Sirah Nabawiyah (QASiNa) dataset, a novel dataset compiled from Sirah Nabawiyah literatures in Indonesian language.
  • Youtube2Text-QA

    Video question answering task, which requires machines to answer questions about videos in a natural language form.
  • WikiSQL

    Semantic parsing maps a user-issued natural language (NL) utterance to a machine-executable meaning representation (MR), such as λ−calculus (Zettlemoyer and Collins, 2005), SQL...
  • MSMARCO

    The dataset used for training and evaluating IR systems, containing a large collection of documents and queries.
  • Universal Conceptual Cognitive Annotation (UCCA)

    The Universal Conceptual Cognitive Annotation (UCCA) dataset is a graph-based semantic annotation scheme based on typological linguistic principles.
  • TripClick

    The TripClick dataset is a large-scale benchmark for information retrieval.
You can also access this registry using the API (see API Docs).