Dataset - LDM

Modeling Task Interactions in Document-Level Joint Entity and Relation Extrac...

Document-level relation extraction in an end-to-end setting, where the model needs to jointly perform mention extraction, coreference resolution (COREF) and relation extraction...
- Dataset
- JSON
Room-to-Room (R2R) dataset

The Room-to-Room (R2R) dataset is a benchmark for vision-and-language navigation tasks. It consists of 7,189 paths sampled from its navigation graphs, each with three...
- Dataset
- JSON
Wiki-BM

The Wiki-BM dataset is a benchmark for the Split and Rephrase task, consisting of Wikipedia data.
- Dataset
- JSON
BiSECT

The BiSECT dataset is a benchmark for the Split and Rephrase task, consisting of bitexts.
- Dataset
- JSON
WebSplit

The WebSplit dataset is a benchmark for the Split and Rephrase task, consisting of RDF semantic tuples.
- Dataset
- JSON
TruthfulQA

The TruthfulQA dataset is a dataset that contains 817 questions designed to evaluate language models' preference to mimic some human falsehoods.
- Dataset
- JSON
SciFact

The SciFact dataset is a collection of scientific fact questions and their corresponding answers.
- Dataset
- JSON
Simplifying graph convolutional networks

Simplifying graph convolutional networks.
- Dataset
- JSON
StackLLaMA: An RL fine-tuned LLaMA model for Stack Exchange question and answ...

The dataset used in the paper is the StackExchange dataset.
- Dataset
- JSON
Symbolic, Language Agnostic and Ontologically Grounded Large Language Models

The dataset used in the paper to demonstrate the limitations of large language models (LLMs) in capturing inferential aspects of natural language.
- Dataset
- JSON
SimpleQuestion

The SimpleQuestion dataset is a dataset for question answering, consisting of 100,000 questions and 1,000,000 answers.
- Dataset
- JSON
HH-RLHF

The HH-RLHF dataset is a human preference dataset for reinforcement learning from human feedback.
- Dataset
- JSON
Kubric

Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent...
- Dataset
- JSON
REVERIE dataset

The REVERIE dataset is a dataset of household tasks in an indoor environment. It contains images annotated with natural language instructions including the referring expressions...
- Dataset
- JSON
WebKB

The dataset used in this paper is a probabilistic logic programming dataset, which is a probabilistic version of the WebKB dataset.
- Dataset
- JSON
Pandalm Dataset

The dataset used to train Pandalm, a generative safety evaluator for Chinese.
- Dataset
- JSON
Auto-J Dataset

The dataset used to train Auto-J, a generative safety evaluator for English.
- Dataset
- JSON
Jade Dataset

The dataset used to train Jade, a linguistic-based safety evaluation platform for Chinese.
- Dataset
- JSON
ShieldLM Dataset

The dataset used to train ShieldLM, a generative safety evaluator for English.
- Dataset
- JSON
SAFETY-J Dataset

The dataset used to train SAFETY-J, a bilingual generative safety evaluator for English and Chinese.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

67 datasets found