Dataset - LDM

MS MARCO NLGen

The MS MARCO NLGen dataset is a collection of natural language generation tasks, where the goal is to generate natural-sounding answers to questions.
- Dataset
- JSON
LaiYe

The LaiYe dataset is a dataset for information-seeking conversation systems. It contains user queries and answers, and is used to evaluate the performance of information-seeking...
- Dataset
- JSON
Quora

The Quora dataset is a large-scale dataset for information-seeking conversation systems. It contains questions and answers, and is used to evaluate the performance of...
- Dataset
- JSON
FB15k-237

Knowledge graphs (KGs) are collections of facts. Some well-known knowledge graphs include Freebase (Bollacker et al., 2008), Word-Net (Miller, 1995), YAGO (Suchanek et al.,...
- Dataset
- JSON
WN18RR

Knowledge graphs store a wealth of knowledge from the real world into structured graphs, which consist of collections of triplets, and each triplet (h, r, t) represents that...
- Dataset
- JSON
FactCheckQA

FactCheckQA is a refreshable dataset for probing model performance in trusted source alignment.
- Dataset
- JSON
SQuAD

The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...
- Dataset
- JSON
SimpleQuestion Dataset

The dataset used in the paper is a collection of data for the Simple Question dataset, which contains questions answerable using Wikidata as the knowledge graph.
- Dataset
- JSON
Collective classiﬁcation in network data

Collective classiﬁcation in network data.
- Dataset
- JSON
GeoQA and GeoQA+

Geometry Problem Solving (GPS), which is a classic and challenging math problem, has attracted much attention in recent years. It requires a solver to comprehensively understand...
- Dataset
- JSON
CommonsenseQA

The dataset used in the paper is also mentioned as CommonsenseQA, which is a 5-way multiple choice QA dataset that requires commonsense knowledge.
- Dataset
- JSON
Natural Questions

The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.
- Dataset
- JSON
TriviaQA

The TriviaQA dataset is a collection of questions sourced from Quiz League websites, with sentence-level supporting facts annotation.
- Dataset
- JSON
Contextualized Sequence Likelihood

The authors used several question-answering datasets, including CoQA, TriviaQA, and Natural Questions.
- Dataset
- JSON
SST-2

The dataset used for the experiments across ten models– ranging from bag-of-words models to pre-trained transformers– and ﬁnd that a model having higher AUC does not necessarily...
- Dataset
- JSON
EndoVis-17-VQLA

EndoVis-17-VQLA dataset is a public dataset with 97 frames with common tools and interactions from EndoVis-2017. It is annotated with question-answer-bounding box labels.
- Dataset
- JSON
EndoVis-18-VQLA

EndoVis-18-VQLA dataset is a public dataset with 14 video sequences on robotics surgery procedures. It is combined with the bounding box on tissue-instrument interaction...
- Dataset
- JSON
FUNSD dataset

FUNSD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
CORD dataset

CORD dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
Visual7W dataset

The Visual7W dataset is a visual question answering dataset, which consists of images and corresponding questions.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

196 datasets found