Dataset - LDM

SimpleQuestion

The SimpleQuestion dataset is a dataset for question answering, consisting of 100,000 questions and 1,000,000 answers.
- Dataset
- JSON
HH-RLHF

The HH-RLHF dataset is a human preference dataset for reinforcement learning from human feedback.
- Dataset
- JSON
Kubric

Neural radiance fields (NeRF) excel at synthesizing new views given multi-view, calibrated images of a static scene. When scenes include distractors, which are not persistent...
- Dataset
- JSON
REVERIE dataset

The REVERIE dataset is a dataset of household tasks in an indoor environment. It contains images annotated with natural language instructions including the referring expressions...
- Dataset
- JSON
WebKB

The dataset used in this paper is a probabilistic logic programming dataset, which is a probabilistic version of the WebKB dataset.
- Dataset
- JSON
Pandalm Dataset

The dataset used to train Pandalm, a generative safety evaluator for Chinese.
- Dataset
- JSON
Auto-J Dataset

The dataset used to train Auto-J, a generative safety evaluator for English.
- Dataset
- JSON
Jade Dataset

The dataset used to train Jade, a linguistic-based safety evaluation platform for Chinese.
- Dataset
- JSON
ShieldLM Dataset

The dataset used to train ShieldLM, a generative safety evaluator for English.
- Dataset
- JSON
SAFETY-J Dataset

The dataset used to train SAFETY-J, a bilingual generative safety evaluator for English and Chinese.
- Dataset
- JSON
Singapore Rapid Transit Systems Regulations

Singapore Rapid Transit Systems Regulations is a collection of regulations proclaimed by the Singapore government.
- Dataset
- JSON
Universal and transferable adversarial attacks on aligned language models

AdvBench is a dataset for evaluating the safety of large language models.
- Dataset
- JSON
Social Chemistry 101: Learning to reason about social and moral norms

Social Chemistry 101 is a dataset that encompasses diverse social norms.
- Dataset
- JSON
Aligning AI with shared human values

ETHICS is a benchmark for evaluating a language model's knowledge of fundamental ethical concepts.
- Dataset
- JSON
Crows-pairs: A challenge dataset for measuring social biases in masked langua...

CrowS-Pairs is a challenge dataset for measuring social biases in masked language models.
- Dataset
- JSON
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evalua...

ALI-Agent is an evaluation framework that leverages the autonomous abilities of LLM-powered agents to probe adaptive and long-tail risks in target LLMs.
- Dataset
- JSON
Opportunity activity recognition dataset

Opportunity activity recognition dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
Helpful and Harmless

The dataset used for training and evaluation of the proposed RRHF paradigm.
- Dataset
- JSON
DocRED

DocRED is a large-scale human-annotated dataset for document-level RE, which is constructed from Wikipedia and Wikidata.
- Dataset
- JSON
TREC Deep Learning 2020

Large-scale passage retrieval aims to fetch relevant passages from a million- or billion-scale collection for a given query to meet users’ information needs, serving as an...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

77 datasets found