MST: Masked Self-Supervised Transformer for Visual Representation

The proposed method is a self-supervised learning approach for visual representation learning, which can explicitly capture the local context of an image while preserving the global semantic information.

BibTex: