Dataset - LDM

NLPbench

The dataset is used for evaluating large language models on solving NLP problems.
- Dataset
- JSON
DEEP

Detecting Errors through Ensembling Prompts (DEEP) - an end-to-end large language model framework for detecting factual errors in text summarization.
- Dataset
- JSON
FP6-LLM: Efficiently Serving Large Language Models Through FP6-Centric Algori...

Six-bit quantization can effectively reduce the size of large language models and preserve the model quality consistently across varied applications.
- Dataset
- JSON
mC4

Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there...
- Dataset
- JSON
Fairness Certification for Natural Language Processing and Large Language Models

The dataset used in the paper is a large corpus of text data, which is used to train and evaluate natural language processing models.
- Dataset
- JSON
EntityQ

The dataset used to evaluate the Thread of Thought (ThoT) strategy, which is designed to enhance the performance of Large Language Models (LLMs) in processing chaotic contextual...
- Dataset
- JSON
Integer or floating point? new outlooks for low-bit quantization on large lan...

The dataset used in the paper is not explicitly described, but it is mentioned that it is a large language model dataset.
- Dataset
- JSON
A comprehensive study on post-training quantization for large language models

The ZeroQuant dataset is a large language model dataset used in the paper.
- Dataset
- JSON
Opt: Open pre-trained transformer language models

The OPT dataset is a large language model dataset used in the paper.
- Dataset
- JSON
ZeroQuant-FP: A Leap Forward in LLMs Post-Training W4A8 Quantization Using Fl...

The dataset used in the paper is not explicitly described, but it is mentioned that it is a large language model dataset.
- Dataset
- JSON
WebWISE: Web Interface Control and Sequential Exploration with Large Language...

The paper investigates using a Large Language Model (LLM) to automatically perform web software tasks using click, scroll, and text input operations.
- Dataset
- JSON
Modality-Aware Integration with Large Language Models for Knowledge-based Vis...

Knowledge-based visual question answering (KVQA) has been extensively studied to answer visual questions with external knowledge, e.g., knowledge graphs (KGs).
- Dataset
- JSON
Knowledge Graph-Enhanced Large Language Models via Path Selection

Two datasets, MetaQA and FACTKG, are used to evaluate the effectiveness of the proposed method KELP. MetaQA is a critical benchmark dataset containing subsets of questions with...
- Dataset
- JSON
LiveCodeBench

LiveCodeBench is a benchmark for evaluating the performance of Large Language Models (LLMs) in code editing tasks, including debugging, translating, polishing, and requirement...
- Dataset
- JSON
Towards Expert-Level Medical Question Answering with Large Language Models

The Towards Expert-Level Medical Question Answering with Large Language Models dataset contains a large-scale dataset for medical question answering using large language models.
- Dataset
- JSON
LLaVA-Instruct-150k

Visual question answering dataset
- Dataset
- JSON
ReasonDet

Reasoning detection dataset for multimodal large language models
- Dataset
- JSON
TRICOTS

A tool for collecting traces from any Python codebase that uses OpenAI’s API.
- Dataset
- JSON
MACHIAVELLI Benchmark

A dataset of traces from the MACHIAVELLI environment, including API calls and their outcomes.
- Dataset
- JSON
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM ...

A structured collection of tests for input-output safeguards, including established failure tests, emerging failure tests, and next-gen architecture tests.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

45 datasets found