-
Conceptual Caption
The dataset used in the paper is Conceptual Caption, which is a large-scale dataset of images with captions. -
FIT: Far-reaching Interleaved Transformers
We present FIT: a transformer-based architecture with efficient self-attention and adaptive computation.