Dataset - LDM

WikiSum Evaluation Dataset

A dataset of 1,527 Wikipedia biographies about women, where information on the internet is not as easily retrieved.
- Dataset
- JSON
HumanEval

The dataset used in the paper is the HumanEval dataset, which is used to evaluate the performance of language models.
- Dataset
- JSON
Problems with evaluation of word embeddings using word similarity tasks

This dataset has no description
- Dataset
- JSON
USR-TopicalChat

This dataset is used for dialogue response evaluation.
- Dataset
- JSON
USR-PersonaChat

This dataset is used for dialogue response evaluation.
- Dataset
- JSON
Dailydialog-Eval

This dataset is used for dialogue response evaluation.
- Dataset
- JSON
FlyingChairs

A dataset for optical flow evaluation, including a naturalistic open source movie.
- Dataset
- JSON
Open Graph Benchmark: Datasets for Machine Learning on Graphs

Open Graph Benchmark: Datasets for machine learning on graphs.
- Dataset
- JSON
Synthetic Networks

The dataset used in the paper is a synthetic network generated under four network models: SBM, DCBM, RDPG, and latent space model.
- Dataset
- JSON
Contact Network of Secondary School Students

The dataset used in the paper is a collection of real and synthetic networks for community detection evaluation.
- Dataset
- JSON
Scientific Collaboration Networks

The dataset used in the paper is a collection of real and synthetic networks for community detection evaluation.
- Dataset
- JSON
Community Detection Evaluation

The dataset used in the paper is a collection of real and synthetic networks for community detection evaluation.
- Dataset
- JSON
A Modularized Evaluation for Topic Popularity Prediction

Topic popularity prediction in social networks has drawn much attention recently. Various elegant models have been proposed for this issue. However, different datasets and...
- Dataset
- JSON
FACTOR

The dataset used in this paper is FACTOR, a benchmark for factuality evaluation of language models.
- Dataset
- JSON
HaluEval-Sum

The dataset used in this paper is HaluEval-Sum, a large-scale hallucination evaluation benchmark for large language models.
- Dataset
- JSON
BEIR

The BEIR dataset is a large-scale zero-shot evaluation dataset for information retrieval models, consisting of 13,000 documents and 1,000 questions.
- Dataset
- JSON
Text Simplification Datasets: Exploration

Text Simplification datasets have limitations and need to be improved to build more robust models.
- Dataset
- JSON
GPT-4 Evaluation Dataset

The dataset used for the evaluation of GPT-4's performance in systematic review tasks.
- Dataset
- JSON
Pitfalls of graph neural network evaluation

Pitfalls of graph neural network evaluation
- Dataset
- JSON
Surprise Test Set

The surprise test set is used for evaluating the performance of the proposed system.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

28 datasets found