Compositional Image Retrieval through Vision-by-Language (CIReVL) is a training-free approach for Zero-Shot Compositional Image Retrieval (CIR). Utilizing off-the-shelf...
The FashionIQ dataset contains images of fashion products over 3 categories: Dress, Toptee, and Shirt, with 46,609 images in the training and 31,075 images in the validation set.