Fine-grained visual comparisons with local learning. This dataset comprises 50,025 shoe images. It consists of 4 attributes containing 34 classes each.
The dataset used in the paper is the Shoes dataset, which consists of c.50,000 examples of shoes in RGB color, from 4 different categories and over 3000 different subcategories.