Dataset - LDM

TREC dataset

The dataset used in the paper is the TREC dataset, which consists of 124 queries.
- Dataset
- JSON
CGQA

The CGQA dataset is a large dataset containing 413 attributes and 674 object categories.
- Dataset
- JSON
SQuAD: 100,000+ Questions for Machine Comprehension of Text

The SQuAD dataset is a benchmark for natural language understanding tasks, including question answering and text classification.
- Dataset
- JSON
FarFetched: Entity-centric Reasoning and Claim Validation for the Greek Language

FarFetched is a modular framework that enables people to verify any kind of textual claim based on the incorporated evidence from textual news sources.
- Dataset
- JSON
CounterFact

The dataset used in the paper is a collection of irrelevant questions that are more challenging than the ones in existing datasets.
- Dataset
- JSON
bAbI Question Answering dataset

The bAbI Question Answering dataset is a benchmark for evaluating the ability of RNNs to answer questions.
- Dataset
- JSON
Brilla AI Dataset

A dataset of NSMQ contests from 2012-2022 containing videos of the contest and corresponding metadata, text form of riddles questions, and open-source science textbooks.
- Dataset
- JSON
WebQuestions

The task of Question Answering over Linked Data (QALD) has received increased attention over the last years (see the surveys [14] and [36]). The task consists in mapping natural...
- Dataset
- JSON
InsuranceQA

The InsuranceQA dataset is a question answering dataset containing questions and answers.
- Dataset
- JSON
Conversational dataset

The conversational dataset is used to evaluate the performance of the proposed algorithms. The dataset consists of 20,000 questions and answers, where each question is answered...
- Dataset
- JSON
LAMBADA

The dataset used in the paper is a corpus of text containing approximately 10,000 examples, each a sequence of sentences extracted from books.
- Dataset
- JSON
EuSQuAD

EuSQuAD: Automatically Translated and Aligned SQuAD2.0 for Basque
- Dataset
- JSON
TruthfulQA

The TruthfulQA dataset is a dataset that contains 817 questions designed to evaluate language models' preference to mimic some human falsehoods.
- Dataset
- JSON
BioASQ

The BioASQ dataset contains questions and answers from various sources, including Wikipedia and biomedical literature.
- Dataset
- JSON
SNLI dataset

The dataset used in the paper is the SNLI dataset.
- Dataset
- JSON
DEFT

DEFT consists of two categories of definitions: a) Contracts: involving 2,433 sentences from the 2017 SEC contract ﬁling, and b) Textbook: involving 21,303 sentences from...
- Dataset
- JSON
FVQA

FVQA is a fact-based visual question answering dataset, containing 2190 images and 5826 (question, answer) pairs, with supporting facts selected from knowledge bases.
- Dataset
- JSON
Question Answering Based Clinical Text Structuring

Clinical text structuring is a critical and fundamental task for clinical research. Traditional methods such as task-specific end-to-end models and pipeline models usually...
- Dataset
- JSON
NeuralQA

A usable library for question answering on large datasets
- Dataset
- JSON
QQP Dataset

The QQP dataset contains more than 400k question pairs.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

119 datasets found