Conditional positional encodings for vision transformers

Conditional positional encodings for vision transformers.

BibTex: