-
PNG Dataset
The PNG dataset consists of image-text pairs. Unlike datasets such as RefCOCO, PNG dataset is characterized by lengthy descriptions of all the objects and their relationships... -
Generalizable Entity Grounding via Assistance of Large Language Model
The GELLA framework leverages a large language model to ground entities with long captions.