ReferItGame

Visual grounding is the task of localizing a language query in an image. The output is often a bounding box as drawn in the yellow color.

BibTex: