Colored MNIST dataset

doi:doi:10.57702/vhvkfs6c

Colored MNIST dataset

The dataset used in the paper is a binary classification task in a 300-dimensional space. The procedure for generating the training dataset is as follows: Each label y ∈ {−1, 1} is sampled uniformly at random. The first component x1 is sampled from a mixture of two Gaussian distributions with a variance of 0.15, centered at y and 1 − y respectively, with mixing proportions of 0.9 and 0.1. As the training dataset size increases, the model’s ability to learn this feature improves, thereby improving the test accuracy. The remaining 299 dimensions (x2,..., x300) are drawn from a standard normal distribution with zero mean and a variance of 0.1. They constitute the nuisance subspace, primarily used to memorise label noise.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Borja Rodr´ıguez-G´alvez, Ragnar Thobaben, Mikael Skoglund (2024). Dataset: Colored MNIST dataset. https://doi.org/10.57702/vhvkfs6c

DOI retrieved: December 16, 2024

Additional Info

Field	Value
Created	December 16, 2024
Last update	December 16, 2024
Defined In	https://doi.org/10.48550/arXiv.2406.19049
Citation	https://doi.org/10.48550/arXiv.1811.00073 https://doi.org/10.48550/arXiv.2006.06332
Author	Borja Rodr´ıguez-G´alvez
More Authors	Ragnar Thobaben Mikael Skoglund
Homepage	https://arxiv.org/abs/1904.01059