Dataset - LDM

GTA-V

The GTA-V dataset is a large-scale dataset of driving scenes, consisting of 56 training videos and 21 test videos.
- Dataset
- JSON
Dataset D

The dataset is used for blood glucose prediction based on continuous glucose monitoring (CGM) device readings.
- Dataset
- JSON
SEAME

The dataset used for the code-switched speech recognition task, which consists of Mandarin-English code-switched corpora.
- Dataset
- JSON
nvBench-Rob(nlq,schema)

The nvBench-Rob(nlq,schema) dataset is a testing set from nvBench-Rob, containing both NLQ variants and data schema variants, specifically designed to test the robustness of...
- Dataset
- JSON
nvBench-Robschema

The nvBench-Robschema dataset is a testing set from nvBench-Rob, containing only data schema variants, specifically designed to test the robustness of models against data schema...
- Dataset
- JSON
nvBench-Robnlq

The nvBench-Robnlq dataset is a testing set from nvBench-Rob, containing only NLQ variants, specifically designed to test the robustness of models against NLQ variants.
- Dataset
- JSON
nvBench

The nvBench dataset is a benchmark for text-to-vis models, containing natural language questions and their corresponding data visualizations.
- Dataset
- JSON
nvBench-Rob

The nvBench-Rob dataset is a comprehensive robustness evaluation dataset for text-to-vis models, containing diverse lexical and phrasal variations based on the original...
- Dataset
- JSON
Hyperspectral Unmixing Dataset

A laboratory-created dataset with ground-truth for hyperspectral unmixing evaluation
- Dataset
- JSON
WebLI

The dataset used in the paper for subject-driven text-to-image synthesis
- Dataset
- JSON
BREEDS dataset

The BREEDS dataset is used for evaluating the proposed method.
- Dataset
- JSON
iNaturalist 2018 dataset

The dataset used in the paper is the iNaturalist 2018 dataset, which is a real-world large-scale imbalanced dataset.
- Dataset
- JSON
Rotowire

The dataset used in the paper for Rotowire
- Dataset
- JSON
MBPP

The dataset used in the paper for code generation
- Dataset
- JSON
HumanEval

The dataset used in the paper is the HumanEval dataset, which is used to evaluate the performance of language models.
- Dataset
- JSON
CiteSeerX Name Disambiguation Dataset

The dataset contains 10 highly ambiguous name references with 1091 documents and 74 distinct real-life authors.
- Dataset
- JSON
Arnetminer Name Disambiguation Dataset

The dataset contains 10 highly ambiguous name references with 1091 documents and 74 distinct real-life authors.
- Dataset
- JSON
mC4

Parameter-efficient fine-tuning (PEFT) using labeled task data can significantly improve the performance of large language models (LLMs) on the downstream task. However, there...
- Dataset
- JSON
AstraZeneca Global Cell Bank - Brightfield Imaging Dataset (AZGCB-BFID)

A large-scale dataset of 165190 brightfield images of 32 different cell lines across 93 experimental batches.
- Dataset
- JSON
LinkedIn

The LinkedIn dataset used for training and testing the proposed model, containing leaked passwords.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

302 datasets found