Dataset - LDM

Counting dataset

The dataset used in the paper is a counting probe dataset, which consists of images and corresponding questions or statements about the number of entities in the image.
- Dataset
- JSON
Neural Collaborative Filtering

The dataset is used for neural collaborative filtering, which is a type of collaborative filtering that uses neural networks to learn the relationships between users and items.
- Dataset
- JSON
IMDB-RLHF-Pair dataset

The IMDB-RLHF-Pair dataset is generated by IMDB, and responses with positive sentiment are preferred.
- Dataset
- JSON
Stack-Exchange-Paired dataset

The Stack-Exchange-Paired dataset contains questions and answers from the Stack Overflow dataset, where answers with more votes are preferred.
- Dataset
- JSON
XQuAD

The XQuAD dataset is a multilingual question answering dataset.
- Dataset
- JSON
TyDi QA

Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there...
- Dataset
- JSON
Synthetic Data

The dataset used in the paper is a synthetic dataset for off-policy contextual bandits, with contexts x ∈ X, a finite set of actions A, and bounded real rewards r ∈ A → [0, 1].
- Dataset
- JSON
Quora Dataset

The dataset used in this paper is a real-world dataset from Quora, containing 372,818 questions and 1,739,222 answers associated with topics, upvotes, timestamps, etc.
- Dataset
- JSON
Stanford Question Answering Dataset (SQuAD 2.0)

The Stanford Question Answering Dataset (SQuAD 2.0) supplements the SQuAD 1.1 with over 50K unanswerable questions.
- Dataset
- JSON
Stanford Question Answering Dataset (SQuAD 1.1)

The Stanford Question Answering Dataset (SQuAD 1.1) is a dataset of more than 100K questions which all can be answered by locating a span of text from the corresponding context...
- Dataset
- JSON
AVQA

The AVQA dataset contains 57,015 videos and 57,335 question-and-answer pairs.
- Dataset
- JSON
Music-AVQA

The Music-AVQA dataset contains multiple question-and-answer pairs, with 9,288 videos and 45,867 question-and-answer pairs.
- Dataset
- JSON
Audio-Visual Question Answering

Audio-visual question answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer.
- Dataset
- JSON
Yahoo Answers

The dataset Yahoo Answers contains 730,000 questions and answers.
- Dataset
- JSON
Bing dataset

The Bing dataset is a large-scale dataset for natural language understanding and question answering.
- Dataset
- JSON
MS MARCO dataset

The MS MARCO dataset is a large-scale dataset for natural language understanding and question answering.
- Dataset
- JSON
Abstraction and Reasoning Corpus (ARC)

A collection of heterogeneous visual reasoning data sets and an interesting benchmark for two reasons: First, visual reasoning programs tend to be large (in current program...
- Dataset
- JSON
RJUA-QA

The RJUA-QA dataset is a urological domain open-source dataset extracted from real-world medical records with 2132 QA pairs.
- Dataset
- JSON
CPQA

The CPQA dataset consists of a cloud product knowledge graph (CPKG) and QA pairs. The dataset is used for domain-specific question answering (QA) tasks.
- Dataset
- JSON
ProCQA

ProCQA is a large-scale community-based programming question answering dataset mined from StackOverflow with strict filtering strategies for quality and fairness control.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

196 datasets found