HSViT: Horizontally Scalable Vision Transformer

This paper introduces a horizontally scalable vision transformer (HSViT) scheme with a novel image-level feature embedding. The design of HSViT preserves the inductive bias from convolutional layers while effectively reducing the number of layers and parameters of the models.

BibTex: