Dataset - LDM

TREC-QA

The TREC-QA dataset is a benchmark dataset for question answering task.
- Dataset
- JSON
Quantum Language Model with Entanglement Embedding for Question Answering

The proposed QLM-EE model is used for question answering task on two benchmark datasets, TREC-QA and WIKIQA.
- Dataset
- JSON
TREC DL

TREC 2019 Deep Learning Track has the same training and dev set as MS MARCO, but replaces the test set with a novel set produced by TREC.
- Dataset
- JSON
NQ

An open-domain QA dataset that consists of a question, a retrieved article, a selected paragraph from the article, and a short answer inferable from the paragraph.
- Dataset
- JSON
User Reported Scenarios (URS) dataset

The User Reported Scenarios (URS) dataset is a collection of real-world use cases with 15 LLMs from a user study with 712 participants from 23 countries.
- Dataset
- JSON
one-million-reddit-questions

The dataset contains 500 questions from one million open-ended requests posted on AskReddit, and 129,483 of these questions were identified as asking for help.
- Dataset
- JSON
PathVQA

The dataset used in the paper is a set of sequential vision-and-language tasks, where each task consists of an image and a text input.
- Dataset
- JSON
MS MARCO NLGen

The MS MARCO NLGen dataset is a collection of natural language generation tasks, where the goal is to generate natural-sounding answers to questions.
- Dataset
- JSON
Quora

The Quora dataset is a large-scale dataset for information-seeking conversation systems. It contains questions and answers, and is used to evaluate the performance of...
- Dataset
- JSON
FB15k-237

Knowledge graphs (KGs) are collections of facts. Some well-known knowledge graphs include Freebase (Bollacker et al., 2008), Word-Net (Miller, 1995), YAGO (Suchanek et al.,...
- Dataset
- JSON
WN18RR

Knowledge graphs store a wealth of knowledge from the real world into structured graphs, which consist of collections of triplets, and each triplet (h, r, t) represents that...
- Dataset
- JSON
FactCheckQA

FactCheckQA is a refreshable dataset for probing model performance in trusted source alignment.
- Dataset
- JSON
SQuAD

The dataset used in the paper is a multiple-choice reading comprehension dataset, which includes a passage, question, and answer. The passage is a script, and the question is a...
- Dataset
- JSON
SimpleQuestion Dataset

The dataset used in the paper is a collection of data for the Simple Question dataset, which contains questions answerable using Wikidata as the knowledge graph.
- Dataset
- JSON
Collective classiﬁcation in network data

Collective classiﬁcation in network data.
- Dataset
- JSON
Conditional Generative Matching Model for Multi-lingual Reply Suggestion

A Conditional Generative Matching Model for Multi-lingual Reply Suggestion
- Dataset
- JSON
CommonsenseQA

The dataset used in the paper is also mentioned as CommonsenseQA, which is a 5-way multiple choice QA dataset that requires commonsense knowledge.
- Dataset
- JSON
Visual Text Question Answering (VTQA)

A new challenge named Visual Text Question Answering (VTQA) along with a corresponding dataset, which includes 23,781 questions based on 10,124 image-text pairs.
- Dataset
- JSON
Natural Questions

The Natural Questions dataset consists of questions extracted from web queries, with each question accompanied by a corresponding Wikipedia article containing the answer.
- Dataset
- JSON
TriviaQA

The TriviaQA dataset is a collection of questions sourced from Quiz League websites, with sentence-level supporting facts annotation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

416 datasets found