Dataset - LDM

Dataset for Hashtag Recommendation Evaluation

Dataset for evaluating hashtag recommendation methods
- Dataset
- JSON
nvBench-Rob(nlq,schema)

The nvBench-Rob(nlq,schema) dataset is a testing set from nvBench-Rob, containing both NLQ variants and data schema variants, specifically designed to test the robustness of...
- Dataset
- JSON
nvBench-Robschema

The nvBench-Robschema dataset is a testing set from nvBench-Rob, containing only data schema variants, specifically designed to test the robustness of models against data schema...
- Dataset
- JSON
nvBench-Robnlq

The nvBench-Robnlq dataset is a testing set from nvBench-Rob, containing only NLQ variants, specifically designed to test the robustness of models against NLQ variants.
- Dataset
- JSON
nvBench-Rob

The nvBench-Rob dataset is a comprehensive robustness evaluation dataset for text-to-vis models, containing diverse lexical and phrasal variations based on the original...
- Dataset
- JSON
Human-Centered IML Systems

A dataset for designing and evaluating human-centered IML systems
- Dataset
- JSON
Rotowire

The dataset used in the paper for Rotowire
- Dataset
- JSON
MBPP

The dataset used in the paper for code generation
- Dataset
- JSON
HumanEval

The dataset used in the paper is the HumanEval dataset, which is used to evaluate the performance of language models.
- Dataset
- JSON
DLGNet

A multi-turn dialogue response generator that was evaluated using automatic metrics.
- Dataset
- JSON
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks

Automatically evaluating vision-language tasks is challenging, especially when it comes to reflecting human judgments due to limitations in accounting for fine-grained details.
- Dataset
- JSON
Expert Demonstrations

The expert demonstrations are generated according to the given optimal policy for the recovery. The length of each expert demonstration is 5-grid size trajectory length. Four...
- Dataset
- JSON
AI Quadrant Dataset

The dataset used for evaluating the performance of AI software, with different levels of smartness and automation.
- Dataset
- JSON
CYBERSECEVAL 2

A wide-ranging cybersecurity evaluation suite for large language models.
- Dataset
- JSON
GridWorld and BlockDude Domains

The GridWorld and BlockDude domains were used to evaluate the proposed task sequencing framework.
- Dataset
- JSON
Quantile Oﬀ-Policy Evaluation via Deep Conditional Generative Learning

The dataset used in this paper for quantile off-policy evaluation via deep conditional generative learning.
- Dataset
- JSON
TACRED Revisited: A Thorough Evaluation of the TACRED Relation Extraction Task

TACRED revisited: A thorough evaluation of the TACRED relation extraction task.
- Dataset
- JSON
NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evalu...

The NATURE dataset is a set of simple spoken-language oriented transformations, applied to the evaluation set of datasets, to introduce human spoken language variations while...
- Dataset
- JSON
Pedestrian Detection: An Evaluation of the State of the Art

Pedestrian detection: An evaluation of the state of the art.
- Dataset
- JSON
SAMSum

The SAMSum dataset is a benchmark for automatic summarization evaluation, containing dialogue summaries and their associated reference summaries.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

31 datasets found