Dataset - LDM

Corpus Pairs Dataset

Corpus pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs.
- Dataset
- JSON
Minimal Pairs Dataset

Minimal pairs dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs.
- Dataset
- JSON
Sentiment Training Dataset

Sentiment training dataset for LABDet, a robust and language-agnostic bias probing method to quantify intrinsic bias in monolingual PLMs.
- Dataset
- JSON
QQP

The Quora Question Pairs (QQP) dataset consists of 50,000 question pairs labeled with paraphrase or non-paraphrase.
- Dataset
- JSON
BEiT

The BEiT dataset used for the experiments in the paper.
- Dataset
- JSON
D3

The D3 dataset contains a curated sample of social media posts from Jigsaw datasets (Jigsaw, 2019, 2018), annotated for offensiveness in text.
- Dataset
- JSON
DICES-350

The DICES-350 dataset is a curated sample of 8k multi-turn conversation corpus generated by human agents interacting with a generative AI-chatbot (Thoppilan et al., 2022) in an...
- Dataset
- JSON
GRASP: A Disagreement Analysis Framework to Assess Group Associations in Pers...

Human annotation plays a core role in machine learning — annotations for supervised models, safety guardrails for generative models, and human feedback for reinforcement...
- Dataset
- JSON
ChatGPT: A conversational AI model

The dataset used in the paper ChatGPT: A conversational AI model.
- Dataset
- JSON
Latent Distance Guided Alignment Training for Large Language Models

Ensuring alignment with human preferences is a crucial characteristic of large language models (LLMs). Presently, the primary alignment methods, RLHF and DPO, require extensive...
- Dataset
- JSON
ParSEL: Parameterized Shape Editing with Language

ParSEL: Parameterized Shape Editing with Language, a system that enables controllable editing of 3D assets with natural language.
- Dataset
- JSON
Temporal Sentence Grounding in Videos

Temporal sentence grounding in videos (TSGV) is a task to retrieve a video segment that semantically corresponds to a query in natural language.
- Dataset
- JSON
APIBank

APIBank is a comprehensive benchmark for tool-augmented LLMs, focusing on API calling, retrieving, and planning abilities.
- Dataset
- JSON
APIBench

APIBench is a comprehensive benchmark for tool-augmented LLMs, focusing on API calling, retrieving, and planning abilities.
- Dataset
- JSON
GTA: A Benchmark for General Tool Agents

GTA is a benchmark for General Tool Agents, featuring three main aspects: real user queries, real deployed tools, and real multimodal inputs.
- Dataset
- JSON
LLMBI

The Large Language Model Bias Index (LLMBI) is a pio-neering approach designed to quantify and address biases inherent in large language models (LLMs), such as GPT-4.
- Dataset
- JSON
RateMyProfessor Dataset

RateMyProfessor dataset, a dataset of student-written reviews for professors.
- Dataset
- JSON
Bias in Bios Dataset

Bias in Bios dataset, a personal biography dataset with information extracted from Wikipedia.
- Dataset
- JSON
Language Agency Classification (LAC) Dataset

Language Agency Classification (LAC) dataset for training accurate language agency classifiers.
- Dataset
- JSON
Reference Letter Dataset

Reference letter dataset generated under the Context-Based Generation (CBG) setting.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

420 datasets found