Dataset - LDM

Yahoo Answers topics

The dataset used in this paper for few-shot text classification task.
- Dataset
- JSON
Visual Question Answering (VQA)

The VQA dataset consists of 248,349 training questions, 121,512 validation questions and 244,302 testing questions, generated on a total of 123,287 images.
- Dataset
- JSON
VQAv2 dataset

The VQAv2 dataset, containing open-ended questions on 265k images, with 5.4 questions per image on average.
- Dataset
- JSON
NLPbench

The dataset is used for evaluating large language models on solving NLP problems.
- Dataset
- JSON
Natural Questions: A Benchmark for Question Answering Research

A benchmark for question answering research is introduced, which includes a large dataset of natural questions.
- Dataset
- JSON
Causal-VidQA

This dataset is used in the paper to evaluate the performance of the TranSTR architecture.
- Dataset
- JSON
ActivityNet-QA

Video question answering (VideoQA) is an essential task in vision-language understanding, which has attracted numerous research attention recently. Nevertheless, existing works...
- Dataset
- JSON
Simple Question dataset

The dataset used in this paper is a set of categorical probability distributions for a finite set of categories A = {a1,..., ak}. The dataset is used to evaluate the proposed...
- Dataset
- JSON
ProKnow-data

The ProKnow-data dataset is a collection of diagnostic conversations guided by safety constraints and ProKnow that healthcare professionals use.
- Dataset
- JSON
QASC

Explanation Gold Standards (XGSs) are emerging as a fundamental enabling tool for step-wise and explainable Natural Language Inference (NLI).
- Dataset
- JSON
FB15K

Knowledge graphs (KGs) such as Freebase (Bollacker et al. 2008), DBpedia (Auer et al. 2007), and YAGO (Mahdisoltani, Biega, and Suchanek 2014) play a critical role in various...
- Dataset
- JSON
NarrativeQA

The NarrativeQA dataset is a reading comprehension challenge that focuses on questions with a single entity and relation.
- Dataset
- JSON
Legal Document Chatbot

A legal document chatbot developed using Langchain and Flask, capable of answering questions within the context of Indian constitution.
- Dataset
- JSON
Stack Overﬂow dataset

The Stack Overﬂow dataset contains data from a question-answering forum on the topic of computer programming.
- Dataset
- JSON
Hellaswag: Can a machine really finish your sentence?

Hellaswag: Can a machine really finish your sentence?
- Dataset
- JSON
SuperGLUE

The dataset used in the paper is the SuperGLUE benchmark, which includes 17 tasks: STS-B, MRPC, MNLI, QNL, QNLI, CoLA, SST-2, MRPC, GLUE, NLI, NQ, ReCoRD, ReCoRD-Sub,...
- Dataset
- JSON
LLM dataset

The dataset used in this paper is not explicitly described, but it is mentioned that it is a large language model (LLM) and that the authors used it to train and evaluate their...
- Dataset
- JSON
CLEVR-Humans

The CLEVR-Humans dataset consists of 32,164 questions asked by humans, containing words and reasoning steps that were unseen in CLEVR.
- Dataset
- JSON
CLOSURE

The CLOSURE dataset consists of 25,200 questions with identical vocabulary but different structure than CLEVR, asked on the same set of images.
- Dataset
- JSON
MMLU dataset

The dataset used in the paper is the Multitask Language Understanding (MMLU) dataset, which consists of 57 tasks from Science, Technology, Engineering, and Math (STEM),...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

135 datasets found