-
Experimental materials dataset
A dataset of 2104 experimental materials, including their crystal structure information, used to train a Variational Autoencoder (VAE) model. -
Distributed Stochastic Gradient Descent (DSGD) dataset
The dataset used in the paper is a network of n nodes, where each node holds a local copy of the decision vector zi. -
NowcastNet dataset
The dataset used for benchmarking machine learning force fields (MLFF) and jet tagging. -
MRMS dataset
The dataset used for benchmarking machine learning force fields (MLFF) and jet tagging. -
TopLandscape dataset
The dataset used for benchmarking machine learning force fields (MLFF) and jet tagging. -
SAIBench: A Structural Interpretation of AI for Science Through Benchmarks
The dataset used for benchmarking machine learning force fields (MLFF) and jet tagging. -
Two-Layer Neural Networks
The dataset is used to analyze the convergence of stochastic gradient descent for two-layer neural networks. -
SVIRO: Synthetic Vehicle Interior Rear Seat Occupancy
A synthetic dataset for sceneries in the passenger compartment of ten different vehicles, to analyze machine learning-based approaches for their generalization capacities and... -
Diabetes dataset
The Diabetes dataset contains 10 variables-dimensions for a sample size (number of points) of 442 and a target (label) variable which quantifies diabetes progress. -
Porous Organic Cages
The dataset used for machine learning accelerated discovery of porous organic cages. -
Metal-Organic Frameworks
The dataset used for machine learning accelerated discovery of metal-organic frameworks. -
Transition-metal complexes
The dataset used for machine learning accelerated discovery of transition-metal complexes. -
Grassmann Manifold
The dataset used in the paper is a collection of data points from the Grassmann manifold. -
Qualitative detection of oil adulteration with machine learning approaches
The study focused on the machine learning analysis approaches to identify the adulteration of 9 kinds of edible oil qualitatively and answered the following three questions: Is... -
ClimSim: A large multi-scale dataset for hybrid physics-ML climate emulation
ClimSim is a large multi-scale dataset for hybrid physics-ML climate emulation. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and... -
Language-assisted Vision Model Debugger
Vision models with high overall accuracy often exhibit systematic errors on some important subsets of data, posing potential serious safety concerns. Diagnosing such bugs of... -
COVID-19 TCR Repertoire Dataset
The dataset used in this paper is a collection of T-cell receptor (TCR) repertoires from COVID-19 patients and healthy controls. The dataset is used to train and evaluate the... -
California Housing Dataset
The California Housing Dataset is a dataset containing information about housing prices in California, with nine features and a target variable of median house price. -
Martian time-series unraveled: A multi-scale nested approach with factorial v...
Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only... -
Ising configurations and RBM
The dataset consists of spin configurations at various temperatures, including both higher and lower than the critical temperature.