Generalizable Entity Grounding via Assistance of Large Language Model

The GELLA framework leverages a large language model to ground entities with long captions.

BibTex: