Colored MNIST dataset

doi:doi:10.57702/vhvkfs6c

Colored MNIST dataset

The dataset used in the paper is a binary classification task in a 300-dimensional space. The procedure for generating the training dataset is as follows: Each label y ∈ {−1, 1} is sampled uniformly at random. The first component x1 is sampled from a mixture of two Gaussian distributions with a variance of 0.15, centered at y and 1 − y respectively, with mixing proportions of 0.9 and 0.1. As the training dataset size increases, the model’s ability to learn this feature improves, thereby improving the test accuracy. The remaining 299 dimensions (x2,..., x300) are drawn from a standard normal distribution with zero mean and a variance of 0.1. They constitute the nuisance subspace, primarily used to memorise label noise.

BibTex:

@dataset{Borja_Rodr´ıguez-G´alvez_and_Ragnar_Thobaben_and_Mikael_Skoglund_2024,
    abstract = {The dataset used in the paper is a binary classification task in a 300-dimensional space. The procedure for generating the training dataset is as follows: Each label y ∈ {−1, 1} is sampled uniformly at random. The first component x1 is sampled from a mixture of two Gaussian distributions with a variance of 0.15, centered at y and 1 − y respectively, with mixing proportions of 0.9 and 0.1. As the training dataset size increases, the model’s ability to learn this feature improves, thereby improving the test accuracy. The remaining 299 dimensions (x2,..., x300) are drawn from a standard normal distribution with zero mean and a variance of 0.1. They constitute the nuisance subspace, primarily used to memorise label noise.},
    author = {Borja Rodr´ıguez-G´alvez and Ragnar Thobaben and Mikael Skoglund},
    doi = {10.57702/vhvkfs6c},
    institution = {No Organization},
    keyword = {'Color Recognition', 'Colored MNIST dataset', 'Colors', 'Digits', 'Image Classification', 'Image classification', 'Label noise', 'MNIST', 'MNIST dataset', 'Nuisance features'},
    month = {dec},
    publisher = {TIB},
    title = {Colored MNIST dataset},
    url = {https://service.tib.eu/ldmservice/dataset/colored-mnist-dataset},
    year = {2024}
}