Query-guided Attention in Vision Transformers for Localizing Objects Using a Single Sketch

Sketch-based object localization in natural images, where given a crude hand-drawn sketch of an object, the goal is to localize all the instances of the same object on the target image.

BibTex: