-
MACHIAVELLI Benchmark
A dataset of traces from the MACHIAVELLI environment, including API calls and their outcomes. -
BELLS: A Framework Towards Future Proof Benchmarks for the Evaluation of LLM ...
A structured collection of tests for input-output safeguards, including established failure tests, emerging failure tests, and next-gen architecture tests. -
AstroMLab 1: Who Wins Astronomy Jeopardy!?
A comprehensive evaluation of proprietary and open-weights large language models using the first astronomy-specific benchmarking dataset. -
Benchmark problems for gray-box optimization
The dataset used in this paper is a set of benchmark problems for gray-box optimization, including the Sphere function, Rosenbrock function, REBGrid function, and others. -
DiLiGenT Benchmark
A benchmark dataset for non-lambertian and uncalibrated photometric stereo. -
WebQuestions
The task of Question Answering over Linked Data (QALD) has received increased attention over the last years (see the surveys [14] and [36]). The task consists in mapping natural... -
CEC'2013 Special Session and Competition on Large-Scale Global Optimization
A benchmark for large-scale global optimization, featuring composite functions with varying sizes and complexities. -
OptimSuite
A broad benchmark suite for black-box optimization, covering a wide range of problems, including academic benchmarks, real-world applications, and discrete optimization problems. -
YCB Object and Model Set
The YCB object and model set is a benchmark for manipulation research, consisting of 15 object categories and 3D models. -
Benchmarking single-image dehazing and beyond
Benchmarking single-image dehazing and beyond -
MoleculeNet dataset
The MoleculeNet dataset is a benchmarking platform for molecular machine learning. -
ModelNet40
Point cloud registration is a crucial problem in computer vision and robotics. Existing methods either rely on matching local geometric features, which are sensitive to the pose...