Dataset - LDM

Multiscale Vision Transformers

Multiscale Vision Transformers (MViT) for video and image recognition, by connecting the seminal idea of multiscale feature hierarchies with transformer models.
- Dataset
- JSON

You can also access this registry using the API (see API Docs).

Before browse our site, please accept our cookies policy