Dataset - LDM

Abstraction and Reasoning Corpus (ARC)

A collection of heterogeneous visual reasoning data sets and an interesting benchmark for two reasons: First, visual reasoning programs tend to be large (in current program...
- Dataset
- JSON
Cora and Citeseer datasets

The Cora and Citeseer datasets are used for training machine learning models to classify documents into different categories.
- Dataset
- JSON
RJUA-QA

The RJUA-QA dataset is a urological domain open-source dataset extracted from real-world medical records with 2132 QA pairs.
- Dataset
- JSON
CPQA

The CPQA dataset consists of a cloud product knowledge graph (CPKG) and QA pairs. The dataset is used for domain-specific question answering (QA) tasks.
- Dataset
- JSON
Sciq

The Sciq dataset is a multi-domain multiple-choice question dataset consisting of 13,000 questions in the fields of physics, chemistry, biology, and other natural sciences.
- Dataset
- JSON
NLVR2 and OKVQA-S

NLVR2 is a challenging VQA dataset that requires the model to compare, locate, and count objects based on the given question and images. OKVQA-S is a challenging category of...
- Dataset
- JSON
Mixture of Rationales (MoR) for Visual Question Answering

Zero-shot visual question answering (VQA) is a challenging task that requires reasoning across modalities. While some existing methods rely on a single rationale within the...
- Dataset
- JSON
VQA-HAT

The VQA-HAT dataset used for visual grounding analysis.
- Dataset
- JSON
VQA-Introspect and VQAv2

The dataset used in the paper for Visual Question Answering (VQA) task, combining VQA-Introspect and VQAv2 datasets.
- Dataset
- JSON
ProCQA

ProCQA is a large-scale community-based programming question answering dataset mined from StackOverflow with strict filtering strategies for quality and fairness control.
- Dataset
- JSON
Quasar-T

Open-domain question answering (QA) is a key challenge in natural language processing. A successful open-domain QA system must be able to effectively retrieve and comprehend one...
- Dataset
- JSON
Quora Question Pairs

The Quora Question Pairs dataset contains 404k English question pairs on Quora, created to test the abilities of the models to understand the semantics from text, and determine...
- Dataset
- JSON
Florence

A large-scale dataset for visual question answering.
- Dataset
- JSON
SQuAD 2.0

The SQuAD 2.0 dataset is a new challenging task for natural language processing, which requires that machine can read, understand, and answer questions about a text. The dataset...
- Dataset
- JSON
MSVD

Text-Video Retrieval (TVR) aims to align relevant video content with natural language queries. To date, most state-of-the-art TVR methods learn image-to-video transfer learning...
- Dataset
- JSON
SmartonAI dataset

The dataset used in the paper is a collection of user queries and corresponding responses generated by the SmartonAI plugin.
- Dataset
- JSON
LaMini: A Large-Scale Instruction Dataset

The LaMini approach involves generating a large-scale instruction dataset by leveraging the outputs of a large language model, gpt-3.5-turbo.
- Dataset
- JSON
SQUAD 2.0 and IMDB

The dataset used in the paper is not explicitly described, but it is mentioned that the authors used the SQUAD 2.0 dataset for Question-Answering and the IMDB dataset for Movie...
- Dataset
- JSON
Quora dataset for question classification

Quora dataset for question classification
- Dataset
- JSON
TREC dataset for question classification

TREC dataset for question classification
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

416 datasets found