You're currently viewing an old version of this dataset. To see the current version, click here.

Clustered Data Distribution

The dataset consists of clusters with means µ(1),..., µ(k) and the examples in the j-th cluster are labeled by y(j). The clusters are generated as follows: we draw j ∼ U[0, 1] and x ∼ N(µ(j), σ2Id), and set y(i) = y(j). Moreover, we assume the following: µ(j) = √d for all j ∈ [k], 0 < σ ≤ maxi |h µ(i), µ(j)| ≤ 4σ√d ln(d) + 1, and k ≤ 4σ√d ln(d)+1.

Data and Resources

Cite this as

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro (2025). Dataset: Clustered Data Distribution. https://doi.org/10.57702/frz0qqt8

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.48550/arXiv.2303.01456
Author Spencer Frei
More Authors
Gal Vardi
Peter L. Bartlett
Nathan Srebro