-
The Surprising Harmfulness of Benign Overfitting for Adversarial Robustness
The dataset is used to study the relationship between benign overfitting and adversarial robustness in machine learning models. -
A general theoretical paradigm to understand learning from human preferences
The paper proposes a novel approach to aligning language models with human preferences, focusing on the use of preference optimization in reward-free RLHF. -
Machine Learning and Bioinformatics for Diagnosis Analysis of Obesity Spectru...
The dataset used for diagnosis analysis of obesity spectrum disorders -
ArduCode: Predictive Framework for Automation Engineering
Two real datasets consisting of 2,927 Arduino projects and 683 Programmable Logic Controller (PLC) projects. -
Accelerating Deep Learning with Shrinkage and Recall
Deep Learning is a very powerful machine learning model. Deep Learning trains a large number of parameters for multiple layers and is very slow when data is in large scale and... -
Llama: Open and efficient foundation language models
The LLaMA dataset is a large language model dataset used in the paper. -
Colored MNIST dataset
The dataset used in the paper is a binary classification task in a 300-dimensional space. The procedure for generating the training dataset is as follows: Each label y ∈ {−1, 1}... -
Automatic Chemical Design Using a Data-Driven Continuous Representation of Mo...
A dataset for automatic chemical design using a data-driven continuous representation of molecules. -
LIME and SHAP explanations for issue type predictions
The dataset contains 3092 issues with the prediction whether they are a bug or not from the machine learning models and their corresponding LIME and SHAP explanations. -
Fusarium head blight detection in wheat under field conditions
A dataset used for detecting Fusarium head blight in wheat under field conditions using a hyperspectral camera and machine learning. -
Agreement ADOS database, Kaggle database, and self-gathered video test dataset
The AGRE ADOS database, Kaggle database, and a self-gathered video test dataset with corresponding ADOS data -
Malware Classification Dataset
The dataset used in this paper is a malware dataset containing 10,896 malware files belonging to 9 different malware families. -
Implicit Multigrid-Augmented DL for the Helmholtz Equation
The dataset used in this paper is a collection of slowness models for the Helmholtz equation, generated from the CIFAR-10, OpenFWI Style-A, and STL-10 datasets. -
3-d Gaussian Data
The dataset used in the paper is a 3-d Gaussian distributed dataset. -
Component Decoupled Data
The dataset used in the paper is a synthetic dataset generated by the component decoupled model described in Section 3. -
Malware dataset
The dataset consists of 20 malware families. Three of these malware families, namely, Winwebsec, Zeroaccess, and Zbot, are from the Malicia dataset, while the remaining 17... -
Gradient-based learning applied to document recognition
Gradient-based learning applied to document recognition. -
BEND: Bagging Deep Learning Training Based on Efficient Neural Network Diffusion
The paper proposes a Bagging Deep Learning Training Framework (BEND) based on efficient neural network diffusion. -
Koopcon: A new approach towards smarter and less complex learning
The dataset condensation problem involves transforming a large-scale training set X into a smaller synthetic set X'. -
Decoy-MNIST
The dataset used in the paper is a synthetic dataset similar to decoy-MNIST of Ross et al. (2017) with induced shortcuts and is presented in Section 5.2. For evaluation on...