Dataset - LDM

AutoCast++: Enhancing World Event Prediction with Zero-Shot Ranking-Based Con...

The Autocast++ dataset is a benchmark for event forecasting using news articles.
- Dataset
- JSON
CLIMA-INS

CLIMA-INS is a dataset composed of semi-structured questionnaires from insurance companies. The dataset is used to train self-supervised models for climate question answering...
- Dataset
- JSON
CLIMA-CDP

CLIMA-CDP is a dataset composed of semi-structured questionnaires from corporations. The dataset is used to train self-supervised models for climate question answering tasks.
- Dataset
- JSON
Simplifying graph convolutional networks

Simplifying graph convolutional networks.
- Dataset
- JSON
StackLLaMA: An RL fine-tuned LLaMA model for Stack Exchange question and answ...

The dataset used in the paper is the StackExchange dataset.
- Dataset
- JSON
Symbolic, Language Agnostic and Ontologically Grounded Large Language Models

The dataset used in the paper to demonstrate the limitations of large language models (LLMs) in capturing inferential aspects of natural language.
- Dataset
- JSON
VQA

The VQA dataset is a large-scale visual question answering dataset that consists of pairs of images that require natural language answers.
- Dataset
- JSON
A general language assistant as a laboratory for alignment

A general language assistant for aligning language models with human users
- Dataset
- JSON
SimpleQuestion

The SimpleQuestion dataset is a dataset for question answering, consisting of 100,000 questions and 1,000,000 answers.
- Dataset
- JSON
Hotpot QA

The dataset includes 3,300 Hotpot QA questions.
- Dataset
- JSON
NQA

The dataset includes 3,300 NQA and 3,300 Hotpot QA questions.
- Dataset
- JSON
NQ-OPEN

The dataset includes 3,610 NQ-OPEN, 3,300 NQA, and 3,300 Hotpot QA questions.
- Dataset
- JSON
PANDA (Pedantic ANswer-correctness Determination and Adjudication)

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current answer correctness...
- Dataset
- JSON
HH-RLHF

The HH-RLHF dataset is a human preference dataset for reinforcement learning from human feedback.
- Dataset
- JSON
Kubric

Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent...
- Dataset
- JSON
MLEC

MLEC is a Chinese multi-choice biomedical question answering dataset.
- Dataset
- JSON
MedMCQA

MedMCQA, a new large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions.
- Dataset
- JSON
ScholarChemQA

ScholarChemQA is a large-scale QA dataset constructed from chemical papers. Specifically, the questions are from paper titles with a question mark, and the multi-choice answers...
- Dataset
- JSON
Question Answering Datasets

The dataset used in the paper is a collection of adversarial examples and natural examples for question answering tasks.
- Dataset
- JSON
REVERIE dataset

The REVERIE dataset is a dataset of household tasks in an indoor environment. It contains images annotated with natural language instructions including the referring expressions...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

196 datasets found