-
CAE v2: Context Autoencoder with CLIP Target
Masked image modeling (MIM) learns visual representation by masking and reconstructing image patches. Applying the reconstruction supervision on the CLIP representation has been... -
Remote Sensing Scene Classification with Masked Image Modeling (MIM)
Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction,... -
MOCA: Masked Online Codebook Assignments prediction
Self-supervised representation learning for Vision Transformers (ViT) to mitigate the greedy needs of ViT networks for very large fully-annotated datasets.