Clustered Data Distribution

doi:doi:10.57702/frz0qqt8

Clustered Data Distribution

The dataset consists of clusters with means µ(1),..., µ(k) and the examples in the j-th cluster are labeled by y(j). The clusters are generated as follows: we draw j ∼ U[0, 1] and x ∼ N(µ(j), σ2Id), and set y(i) = y(j). Moreover, we assume the following: µ(j) = √d for all j ∈ [k], 0 < σ ≤ maxi |h µ(i), µ(j)| ≤ 4σ√d ln(d) + 1, and k ≤ 4σ√d ln(d)+1.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Spencer Frei, Gal Vardi, Peter L. Bartlett, Nathan Srebro (2025). Dataset: Clustered Data Distribution. https://doi.org/10.57702/frz0qqt8

DOI retrieved: January 3, 2025

Additional Info

Field	Value
Created	January 3, 2025
Last update	January 3, 2025
Defined In	https://doi.org/10.48550/arXiv.2303.01456
Author	Spencer Frei
More Authors	Gal Vardi Peter L. Bartlett Nathan Srebro