416 datasets found

Filter Results
  • Abstraction and Reasoning Corpus (ARC)

    A collection of heterogeneous visual reasoning data sets and an interesting benchmark for two reasons: First, visual reasoning programs tend to be large (in current program...
  • Cora and Citeseer datasets

    The Cora and Citeseer datasets are used for training machine learning models to classify documents into different categories.
  • RJUA-QA

    The RJUA-QA dataset is a urological domain open-source dataset extracted from real-world medical records with 2132 QA pairs.
  • CPQA

    The CPQA dataset consists of a cloud product knowledge graph (CPKG) and QA pairs. The dataset is used for domain-specific question answering (QA) tasks.
  • Sciq

    The Sciq dataset is a multi-domain multiple-choice question dataset consisting of 13,000 questions in the fields of physics, chemistry, biology, and other natural sciences.
  • NLVR2 and OKVQA-S

    NLVR2 is a challenging VQA dataset that requires the model to compare, locate, and count objects based on the given question and images. OKVQA-S is a challenging category of...
  • Mixture of Rationales (MoR) for Visual Question Answering

    Zero-shot visual question answering (VQA) is a challenging task that requires reasoning across modalities. While some existing methods rely on a single rationale within the...
  • VQA-HAT

    The VQA-HAT dataset used for visual grounding analysis.
  • VQA-Introspect and VQAv2

    The dataset used in the paper for Visual Question Answering (VQA) task, combining VQA-Introspect and VQAv2 datasets.
  • ProCQA

    ProCQA is a large-scale community-based programming question answering dataset mined from StackOverflow with strict filtering strategies for quality and fairness control.
  • Quasar-T

    Open-domain question answering (QA) is a key challenge in natural language processing. A successful open-domain QA system must be able to effectively retrieve and comprehend one...
  • Quora Question Pairs

    The Quora Question Pairs dataset contains 404k English question pairs on Quora, created to test the abilities of the models to understand the semantics from text, and determine...
  • Florence

    A large-scale dataset for visual question answering.
  • SQuAD 2.0

    The SQuAD 2.0 dataset is a new challenging task for natural language processing, which requires that machine can read, understand, and answer questions about a text. The dataset...
  • MSVD

    Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
  • SmartonAI dataset

    The dataset used in the paper is a collection of user queries and corresponding responses generated by the SmartonAI plugin.
  • LaMini: A Large-Scale Instruction Dataset

    The LaMini approach involves generating a large-scale instruction dataset by leveraging the outputs of a large language model, gpt-3.5-turbo.
  • SQUAD 2.0 and IMDB

    The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the SQUAD 2.0 dataset for Question-Answering and the IMDB dataset for Movie...
  • Quora dataset for question classification

    Quora dataset for question classification
  • TREC dataset for question classification

    TREC dataset for question classification