Vision-Language Models - Groups

When and why Vision-Language Models behave like Bags-of-Words, and what to do...

When and why Vision-Language Models behave like Bags-of-Words, and what to do about it?
- Dataset
- JSON
Open-CLIP

Open-CLIP dataset contains pre-trained CLIP models.
- Dataset
- JSON
Playhouse and AndroidEnv

The dataset used in this paper is the Playhouse and AndroidEnv environments.
- Dataset
- JSON
Graph-Injected Soft Prompting for Compositional Zero-Shot Learning

The proposed GIPCOL framework for compositional zero-shot learning (CZSL) using CLIP-based prompting.
- Dataset
- JSON

4 datasets found