Dataset - LDM

TREC Deep Learning 2019

Large-scale passage retrieval aims to fetch relevant passages from a million- or billion-scale collection for a given query to meet users’ information needs, serving as an...
- Dataset
- JSON
WN18RR

Knowledge graphs store a wealth of knowledge from the real world into structured graphs, which consist of collections of triplets, and each triplet (h, r, t) represents that...
- Dataset
- JSON
SQuAD

The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...
- Dataset
- JSON
SimpleQuestion Dataset

The dataset used in the paper is a collection of data for the Simple Question dataset, which contains questions answerable using Wikidata as the knowledge graph.
- Dataset
- JSON
Collective classiﬁcation in network data

Collective classiﬁcation in network data.
- Dataset
- JSON
FUNSD dataset

FUNSD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
CORD dataset

CORD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
Neural Collaborative Filtering

The dataset is used for neural collaborative filtering, which is a type of collaborative filtering that uses neural networks to learn the relationships between users and items.
- Dataset
- JSON
IMDB-RLHF-Pair dataset

The IMDB-RLHF-Pair dataset is generated by IMDB, and responses with positive sentiment are preferred.
- Dataset
- JSON
Stack-Exchange-Paired dataset

The Stack-Exchange-Paired dataset contains questions and answers from the Stack Overflow dataset, where answers with more votes are preferred.
- Dataset
- JSON
Synthetic Data

The dataset used in the paper is a synthetic dataset for off-policy contextual bandits, with contexts x ∈ X, a finite set of actions A, and bounded real rewards r ∈ A → [0, 1].
- Dataset
- JSON
Abstraction and Reasoning Corpus (ARC)

A collection of heterogeneous visual reasoning data sets and an interesting benchmark for two reasons: First, visual reasoning programs tend to be large (in current program...
- Dataset
- JSON
Cora and Citeseer datasets

The Cora and Citeseer datasets are used for training machine learning models to classify documents into different categories.
- Dataset
- JSON
Kinetics-400

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming....
- Dataset
- JSON
Quora Question Pairs

The Quora Question Pairs dataset contains 404k English question pairs on Quora, created to test the abilities of the models to understand the semantics from text, and determine...
- Dataset
- JSON
SimpleQuestion dataset for adaptive learning

The dataset used in this paper is a collection of questions and answers related to adaptive learning and generative AI.
- Dataset
- JSON
SimpleQuestion dataset for Wikidata

The dataset used in this paper is a reinforcement learning dataset, specifically the SimpleQuestion dataset, which contains questions answerable using Wikidata as the knowledge...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

77 datasets found