MST: Masked Self-Supervised Transformer for Visual Representation
The proposed method is a self-supervised learning approach for visual representation learning, which can explicitly capture the local context of an image while preserving the global semantic information.
BibTex: