Dataset - LDM

CAE v2: Context Autoencoder with CLIP Target

Masked image modeling (MIM) learns visual representation by masking and reconstructing image patches. Applying the reconstruction supervision on the CLIP representation has been...
- Dataset
- JSON
MST: Masked Self-Supervised Transformer for Visual Representation

The proposed method is a self-supervised learning approach for visual representation learning, which can explicitly capture the local context of an image while preserving the...
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

2 datasets found