Nested Hierarchical Transformer

The dataset used in the paper is not explicitly mentioned, but it is implied to be ImageNet and CIFAR-10/100.

BibTex: