F-VLM

F-VLM is a state-of-the-art open-vocabulary object detection model that uses pre-trained CLIP as the frozen backbone.

Data and Resources

Cite this as

Zhou et al (2024). Dataset: F-VLM. https://doi.org/10.57702/n9mcit0x

DOI retrieved: December 16, 2024

Additional Info

Field Value
Created December 16, 2024
Last update December 16, 2024
Defined In https://doi.org/10.48550/arXiv.2309.13042
Author Zhou et al