-
The techqa dataset
TechQA: a dataset for question answering on technical support articles -
MS MARCO Dev (small)
The MS MARCO Dev (small) dataset is a small version of the MS MARCO passage dev set. -
TREC 2020 Deep Learning (Passage Subtask)
The TREC 2020 Deep Learning (Passage Subtask) dataset consists of 54 queries with manual judgments from NIST annotators (211 relevance assessments per query, on average). -
TREC 2019 Deep Learning (Passage Subtask)
The TREC 2019 Deep Learning (Passage Subtask) dataset consists of 43 manually-judged queries using four relevance grades (215 relevance assessments per query, on average). -
SemEval-2013 Task 13
The SemEval-2013 task 13 dataset, containing 20 nouns, 20 verbs, and 10 adjectives in WordNet-sense-tagged contexts. -
DSTC2 dialog dataset
The DSTC2 dialog dataset consists of 5 different tasks, each of which has 1,000 synthetically-generated goal-oriented dialogs between a user and the system in the domain of... -
bAbI dialog dataset
The bAbI dialog dataset consists of 5 different tasks, each of which has 1,000 synthetically-generated goal-oriented dialogs between a user and the system in the domain of... -
bAbI story-based QA dataset
The bAbI story-based QA dataset is composed of 20 different tasks, each of which has 1,000 synthetically-generated story-question pairs. A story can be as short as two sentences... -
Semantic communications: Principles and challenges
This dataset has no description
-
Task-oriented multi-user semantic communications for vqa
This dataset has no description
-
Measuring Massive Multitask Language Understanding
The dataset used in this paper is a multiple choice question set that allows for the evaluation of large language models. -
Discord Questions: A Computational Approach To Diversity Analysis in News Cov...
The dataset used in the paper to evaluate the effectiveness of the Annotated Article, Recomposed Article, and Question Grid interfaces in highlighting news coverage diversity. -
A dataset of clinically generated visual questions and answers about radiolog...
A dataset of clinically generated visual questions and answers about radiology images. -
Med-HallMark
Med-HallMark is a benchmark for detecting and evaluating hallucinations in medical multimodal language models.