Dataset - LDM

WikiKBP

Fine-tune
- Dataset
- JSON
SKE21

Fine-tune
- Dataset
- JSON
NYT11-HRL

Fine-tune
- Dataset
- JSON
TruthfulQA

The TruthfulQA dataset is a dataset that contains 817 questions designed to evaluate language models' preference to mimic some human falsehoods.
- Dataset
- JSON
Evaluating large language models trained on code

The paper presents the results of the OpenAI Codex evaluation on generating Python code.
- Dataset
- JSON
Confidence Calibration in Large Language Models

The dataset used in this study to analyze the self-assessment behavior of Large language models.
- Dataset
- JSON
InternLM2

InternLM2 is a vision-language large model that supports images with any aspect ratio from 336 pixels up to 4K HD, facilitating its deployment in real-world contexts.
- Dataset
- JSON
Proof-Pile-2

The dataset used for continual pre-training of large language models, with a focus on balancing the text distribution and mitigating overfitting.
- Dataset
- JSON
Open-Orca

The dataset used for training large language models, with a focus on balancing the text distribution and mitigating overfitting.
- Dataset
- JSON
Multi-XScience

The dataset used in the paper is a collection of summaries of longer texts, with human evaluators' ratings of existing summaries.
- Dataset
- JSON
SCITDLR

The dataset used in the paper is a collection of summaries of longer texts, with human evaluators' ratings of existing summaries.
- Dataset
- JSON
Multimodal Visual Patterns (MMVP) Benchmark

The Multimodal Visual Patterns (MMVP) benchmark is a dataset used to evaluate the visual question answering capabilities of multimodal large language models (MLLMs).
- Dataset
- JSON
PANDA (Pedantic ANswer-correctness Determination and Adjudication)

Question answering (QA) can only make progress if we know if an answer is correct, but for many of the most challenging and interesting QA examples, current answer correctness...
- Dataset
- JSON
Slot-VLM: SlowFast Slots for Video-Language Modeling

Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the...
- Dataset
- JSON
Chatbot Arena

The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Chatbot Arena model.
- Dataset
- JSON
Arena-Hard

The dataset used in this paper is a large-scale dataset for evaluating LLMs, which is used to train and evaluate the Arena-Hard model.
- Dataset
- JSON
LMSYS ChatBot Arena

The dataset used in this paper is a large-scale real-world LLM conversation dataset, which is used to train and evaluate the LMSYS ChatBot Arena model.
- Dataset
- JSON
WizardArena

The dataset used in this paper is a large-scale conversational data, which is used to train and evaluate the WizardLM-β model.
- Dataset
- JSON
FKTC

FKTC is a test set for evaluating the factual knowledge of large language models. It contains 210,158 prompts in total.
- Dataset
- JSON
PokeMQA: Programmable knowledge editing for Multi-hop Question Answering

Multi-hop question answering (MQA) is one of the challenging tasks to evaluate machine’s comprehension and reasoning abilities, where large language models (LLMs) have widely...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

57 datasets found