FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks

FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks. This paper introduces FocusCLIP, an enhancement for CLIP pretraining using a new ROI encoder. This encoder uses heatmaps to help the model focus on key image areas, improving performance.

Data and Resources

Cite this as

Muhammad Saif Ullah Khan, Muhammad Ferjad Naeem, Federico Tombari, Luc Van Gool, Didier Stricker, Muhammad Zeshan Afzal (2024). Dataset: FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks. https://doi.org/10.57702/duh1gofn

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Author Muhammad Saif Ullah Khan
More Authors
Muhammad Ferjad Naeem
Federico Tombari
Luc Van Gool
Didier Stricker
Muhammad Zeshan Afzal