Dataset - LDM

SRBench

A large benchmarking effort that includes a dataset repository curated for SR, as well as a benchmarking library designed to allow researchers to easily contribute methods.
- Dataset
- JSON
ICLP 2019 Dataset

The dataset contains 122 submissions of abstracts, 92 full submissions, and 30 accepted papers.
- Dataset
- JSON
Netﬂix Dataset

The dataset used in the paper is a Netﬂix dataset, which is a large-scale matrix factorization problem.
- Dataset
- JSON
AMP sequence dataset

The dataset contains 6438 known AMP sequences and 9522 non-AMP sequences from the DBAASP database.
- Dataset
- JSON
CRESCI2017

The CRESCI2017 dataset is a collection of Twitter data, including user information, event logs, and tweets. It contains four types of accounts: genuine users, social bots,...
- Dataset
- JSON
Taobao Dataset

Precise user modeling is critical for online personalized recommen-der services. Generally, users’ interests are diverse and are not limited to a single aspect, which is...
- Dataset
- JSON
Spacecraft Pose Estimation Dataset (SPEED)

The SPEED dataset is used for training and evaluating spacecraft pose estimation algorithms.
- Dataset
- JSON
TMID: A Comprehensive Real-world Dataset for Trademark Infringement Detection...

A comprehensive real-world dataset for trademark infringement detection in e-commerce, sourced directly from Alipay, one of the world's largest e-commerce and digital payment...
- Dataset
- JSON
TVMI3K

A large-scale microscope image dataset of Trichomonas vaginalis for segmentation, consisting of 3,158 images with high-quality annotations.
- Dataset
- JSON
CCCS-CIC-AndMal-2020

The CCCS-CIC-AndMal-2020 dataset comprises 400K android apps, to test and assess the suggested methodology.
- Dataset
- JSON
comma2k19 dataset

The comma2k19 dataset is used to evaluate the robustness of lane detection models under physical-world adversarial attacks in autonomous driving.
- Dataset
- JSON
News Recommendation Dataset

The News Recommendation Dataset is a real-world dataset used to evaluate the performance of the EENR framework.
- Dataset
- JSON
Chinese Event Extraction Dataset

The Chinese Event Extraction Dataset is used to train EE modular, and News Recommendation Dataset is our target recommendation dataset.
- Dataset
- JSON
Mr. TyDi

The Mr. TyDi dataset is a multilingual dataset for dense retrieval, consisting of 100,000 passages and 1,000,000 queries.
- Dataset
- JSON
MIRACL

The MIRACL dataset is a unique resource for researchers working on search across multiple languages. It covers 18 different languages, each of which is divided into four parts:...
- Dataset
- JSON
Colosseum dataset

The Colosseum dataset is a dataset of network trafﬁc ﬂow level, which is used to train and test the trafﬁc steering algorithm.
- Dataset
- JSON
Phi-2: A Dataset for Language Model Evaluation

The Phi-2 dataset is a collection of language models used to evaluate the performance of language models.
- Dataset
- JSON
MBPP: A Dataset for Language Model Evaluation

The MBPP dataset is a collection of basic programming questions used to evaluate the performance of language models.
- Dataset
- JSON
Traffic Light Control Dataset

The traffic light control dataset is used to evaluate the performance of reinforcement learning models in traffic light control.
- Dataset
- JSON
APPS: A Dataset for Code Generation Evaluation

The APPS dataset is a collection of programming problems used to evaluate the performance of code generation models.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

227 datasets found