MOCA: Masked Online Codebook Assignments prediction

Self-supervised representation learning for Vision Transformers (ViT) to mitigate the greedy needs of ViT networks for very large fully-annotated datasets.

BibTex: