Clustered Data Distribution

doi:doi:10.57702/frz0qqt8

You're currently viewing an old version of this dataset. To see the current version, click here.

Clustered Data Distribution

The dataset consists of clusters with means µ(1),..., µ(k) and the examples in the j-th cluster are labeled by y(j). The clusters are generated as follows: we draw j ∼ U[0, 1] and x ∼ N(µ(j), σ2Id), and set y(i) = y(j). Moreover, we assume the following: µ(j) = √d for all j ∈ [k], 0 < σ ≤ maxi |h µ(i), µ(j)| ≤ 4σ√d ln(d) + 1, and k ≤ 4σ√d ln(d)+1.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro (2025). Dataset: Clustered Data Distribution. https://doi.org/10.57702/frz0qqt8

DOI retrieved: January 3, 2025

Additional Info

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.2303.01456
Author	Spencer Frei
More Authors	Gal Vardi Peter L. Bartlett Nathan Srebro