-
SimpleQuestion
The SimpleQuestion dataset is a dataset for question answering, consisting of 100,000 questions and 1,000,000 answers. -
REVERIE dataset
The REVERIE dataset is a dataset of household tasks in an indoor environment. It contains images annotated with natural language instructions including the referring expressions... -
Pandalm Dataset
The dataset used to train Pandalm, a generative safety evaluator for Chinese. -
Auto-J Dataset
The dataset used to train Auto-J, a generative safety evaluator for English. -
Jade Dataset
The dataset used to train Jade, a linguistic-based safety evaluation platform for Chinese. -
ShieldLM Dataset
The dataset used to train ShieldLM, a generative safety evaluator for English. -
SAFETY-J Dataset
The dataset used to train SAFETY-J, a bilingual generative safety evaluator for English and Chinese. -
Singapore Rapid Transit Systems Regulations
Singapore Rapid Transit Systems Regulations is a collection of regulations proclaimed by the Singapore government. -
Universal and transferable adversarial attacks on aligned language models
AdvBench is a dataset for evaluating the safety of large language models. -
Social Chemistry 101: Learning to reason about social and moral norms
Social Chemistry 101 is a dataset that encompasses diverse social norms. -
Aligning AI with shared human values
ETHICS is a benchmark for evaluating a language model's knowledge of fundamental ethical concepts. -
Crows-pairs: A challenge dataset for measuring social biases in masked langua...
CrowS-Pairs is a challenge dataset for measuring social biases in masked language models. -
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evalua...
ALI-Agent is an evaluation framework that leverages the autonomous abilities of LLM-powered agents to probe adaptive and long-tail risks in target LLMs. -
Opportunity activity recognition dataset
Opportunity activity recognition dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation. -
Helpful and Harmless
The dataset used for training and evaluation of the proposed RRHF paradigm. -
TREC Deep Learning 2020
Large-scale passage retrieval aims to fetch relevant passages from a million- or billion-scale collection for a given query to meet users’ information needs, serving as an...