Dataset - LDM

Universal and transferable adversarial attacks on aligned language models

AdvBench is a dataset for evaluating the safety of large language models.
- Dataset
- JSON
Social Chemistry 101: Learning to reason about social and moral norms

Social Chemistry 101 is a dataset that encompasses diverse social norms.
- Dataset
- JSON
Aligning AI with shared human values

ETHICS is a benchmark for evaluating a language model's knowledge of fundamental ethical concepts.
- Dataset
- JSON
Crows-pairs: A challenge dataset for measuring social biases in masked langua...

CrowS-Pairs is a challenge dataset for measuring social biases in masked language models.
- Dataset
- JSON
ALI-Agent: Assessing LLMs' Alignment with Human Values via Agent-based Evalua...

ALI-Agent is an evaluation framework that leverages the autonomous abilities of LLM-powered agents to probe adaptive and long-tail risks in target LLMs.
- Dataset
- JSON
Opportunity activity recognition dataset

Opportunity activity recognition dataset contains questions answerable using Wikidata as the knowledge graph, focusing on questions with a single entity and relation.
- Dataset
- JSON
Disc-medllm

Disc-medllm: Bridging general large language models and real-world medical consultation.
- Dataset
- JSON
CMB-Exam

A large-scale Chinese benchmark for evaluating medical large language models. The dataset consists of 280,839 samples, with 74 tasks, and covers 24 departments and 150 diseases.
- Dataset
- JSON
TruthX: Alleviating Hallucinations by Editing Large Language Models

TruthX: Alleviating Hallucinations by Editing Large Language Models
- Dataset
- JSON
Helpful and Harmless

The dataset used for training and evaluation of the proposed RRHF paradigm.
- Dataset
- JSON
Distance-based approaches to repair semantics in ontology-based data access

The dataset used in this paper is a set of repairs for a knowledge base, with each repair being a maximal R-consistent subset of facts.
- Dataset
- JSON
Math23k

Math23k is the most commonly used Chinese dataset in MWP solving. It contains 23,162 problems with 21,162 training problems, 1,000 validation problems and 1,000 testing problems.
- Dataset
- JSON
DocRED

DocRED is a large-scale human-annotated dataset for document-level RE, which is constructed from Wikipedia and Wikidata.
- Dataset
- JSON
DBpedia

DBpedia is a public knowledge graph which is derived from structured information in Wikipedia, mainly infoboxes.
- Dataset
- JSON
RadQA: A question answering dataset to improve comprehension of radiology rep...

RadQA: A question answering dataset to improve comprehension of radiology reports
- Dataset
- JSON
TREC Deep Learning 2020

Large-scale passage retrieval aims to fetch relevant passages from a million- or billion-scale collection for a given query to meet users’ information needs, serving as an...
- Dataset
- JSON
TREC Deep Learning 2019

Large-scale passage retrieval aims to fetch relevant passages from a million- or billion-scale collection for a given query to meet users’ information needs, serving as an...
- Dataset
- JSON
Temporal Alignment of Pretrained Language Models

This paper introduces the TemporalAlignmentQA (TAQA) dataset, which contains 20,148 time-sensitive questions and their answers for each year from 2000 to 2023.
- Dataset
- JSON
Declaration dataset

The declaration dataset is generated from the annotations from GQA dataset, containing questions and their corresponding declarative sentences.
- Dataset
- JSON
VQA v2.0

We use the VQA v2.0 dataset for the evaluation of our proposed joint model, where the answers are balanced in order to minimize the effectiveness of learning dataset priors.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

416 datasets found