-
Counting dataset
The dataset used in the paper is a counting probe dataset, which consists of images and corresponding questions or statements about the number of entities in the image. -
Neural Collaborative Filtering
The dataset is used for neural collaborative filtering, which is a type of collaborative filtering that uses neural networks to learn the relationships between users and items. -
IMDB-RLHF-Pair dataset
The IMDB-RLHF-Pair dataset is generated by IMDB, and responses with positive sentiment are preferred. -
Stack-Exchange-Paired dataset
The Stack-Exchange-Paired dataset contains questions and answers from the Stack Overflow dataset, where answers with more votes are preferred. -
Synthetic Data
The dataset used in the paper is a synthetic dataset for off-policy contextual bandits, with contexts x ∈ X, a finite set of actions A, and bounded real rewards r ∈ A → [0, 1]. -
Quora Dataset
The dataset used in this paper is a real-world dataset from Quora, containing 372,818 questions and 1,739,222 answers associated with topics, upvotes, timestamps, etc. -
Stanford Question Answering Dataset (SQuAD 2.0)
The Stanford Question Answering Dataset (SQuAD 2.0) supplements the SQuAD 1.1 with over 50K unanswerable questions. -
Stanford Question Answering Dataset (SQuAD 1.1)
The Stanford Question Answering Dataset (SQuAD 1.1) is a dataset of more than 100K questions which all can be answered by locating a span of text from the corresponding context... -
Music-AVQA
The Music-AVQA dataset contains multiple question-and-answer pairs, with 9,288 videos and 45,867 question-and-answer pairs. -
Audio-Visual Question Answering
Audio-visual question answering (AVQA) requires reference to video content and auditory information, followed by correlating the question to predict the most precise answer. -
Yahoo Answers
The dataset Yahoo Answers contains 730,000 questions and answers. -
Bing dataset
The Bing dataset is a large-scale dataset for natural language understanding and question answering. -
MS MARCO dataset
The MS MARCO dataset is a large-scale dataset for natural language understanding and question answering. -
Abstraction and Reasoning Corpus (ARC)
A collection of heterogeneous visual reasoning data sets and an interesting benchmark for two reasons: First, visual reasoning programs tend to be large (in current program...