-
When and why Vision-Language Models behave like Bags-of-Words, and what to do...
When and why Vision-Language Models behave like Bags-of-Words, and what to do about it? -
Playhouse and AndroidEnv
The dataset used in this paper is the Playhouse and AndroidEnv environments. -
Graph-Injected Soft Prompting for Compositional Zero-Shot Learning
The proposed GIPCOL framework for compositional zero-shot learning (CZSL) using CLIP-based prompting.