Dataset - LDM

Tox21 Toxicity Dataset

The Tox21 dataset contains information about the toxicity of various compounds, used for toxicity prediction tasks.
- Dataset
- JSON
ChEMBL Bioactivity Dataset

The ChEMBL dataset is used for drug bioactivity prediction across multiple tasks involving human protein targets.
- Dataset
- JSON
13C NMR Spectra Dataset

The dataset consists of 13C NMR spectra from NMRShiftDB, used for predicting NMR peaks for carbon atoms in various molecules.
- Dataset
- JSON
Covid-Chestxray-Dataset

The Covid-Chestxray-Dataset contains a collection of chest X-ray images of COVID-19 patients, which were used for training and testing purposes in the study.
- Dataset
- JSON
Indoor Object Dataset

A synthetic dataset consisting of rendered indoor scene images with masked objects as the foreground and real-world photographs to validate whether ST-GAN generalizes to real...
- Dataset
- JSON
Authentic Paintings and Sketches Dataset

A dataset of realistic face images and sketches collected from art galleries to evaluate the method's robustness to various styles.
- Dataset
- JSON
Synthesized Stylized Face and Ground Truth Dataset

The dataset consists of pairs of stylized face images and their corresponding ground truth photorealistic faces, created using the CelebA dataset and various style transfer...
- Dataset
- JSON
WMT 2014 English → German

The WMT 2014 dataset contains 4.5M sentence pairs for machine translation from English to German.
- Dataset
- JSON
WMT 2016 Romanian → English

The WMT 2016 dataset comprises 600K sentence pairs for machine translation from Romanian to English.
- Dataset
- JSON
KFTT Japanese → English

The KFTT dataset includes 300K sentence pairs for machine translation from Japanese to English.
- Dataset
- JSON
IWSLT 2017 German → English

The IWSLT 2017 dataset consists of 200K sentence pairs for machine translation from German to English.
- Dataset
- JSON
Cadaver X-ray Images

The dataset includes 10 real X-ray images collected from a cadaver specimen, with ground truth poses obtained by injecting metallic BBs into the surface of the bone, manually...
- Dataset
- JSON
NIH Cancer Imaging Archive CT Scans

The dataset consists of 17 full-body CT scans from the NIH Cancer Imaging Archive used for training, with one CT scan reserved for testing. The pelvis bone was segmented from...
- Dataset
- JSON
Massively Multilingual Machine Translation Dataset

A corpus of parallel documents over 102 languages and English, containing 25 billion training examples across a diverse set of languages used for multilingual neural machine...
- Dataset
- JSON
ELI5

ELI5 is a long-form question answering dataset where the answers are generated based on the concatenation of a question and relevant supporting documents.
- Dataset
- JSON
Downscaled ImageNet

Downscaled ImageNet is a modified version of the standard ImageNet dataset, containing a reduced size of images and fewer classes for training models efficiently.
- Dataset
- JSON
Partial-iLIDS

Partial-iLIDS dataset is a simulated partial dataset based on iLIDs, consisting of 119 persons with 238 images captured by multiple non-overlapping cameras.
- Dataset
- JSON
Partial-ReID

Partial-ReID dataset includes 600 images of 60 persons, with 5 full-body images and 5 partial images per person, collected at a university campus with various viewpoints and...
- Dataset
- JSON
Cornell eRulemaking Corpus (CDCP)

The Cornell eRulemaking Corpus (CDCP) consists of 731 user comments collected from an eRulemaking website, totaling about 4,700 propositions, all considered argumentative. The...
- Dataset
- JSON
Popi Dataset

The Popi dataset consists of six 4D CTs showing the lung region, each including 10 3D CTs and landmarks for registration evaluation.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

20,491 datasets found