Structural Vision Transformer

doi:doi:10.57702/ian4t1e5

Structural Vision Transformer

Structural Vision Transformer (StructViT) is a vision transformer network that leverages structural self-attention (StructSA) to capture correlation structures in images and videos.

Data and Resources

Original MetadataJSON
The json representation of the dataset with its distributions based on DCAT.
Explore
- Preview
- Download

Cite this as

Paul Hongsuck Seo, Cordelia Schmid, Minsu Cho, Manjin Kim (2024). Dataset: Structural Vision Transformer. https://doi.org/10.57702/ian4t1e5

DOI retrieved: December 2, 2024

Additional Info

Field	Value
Created	December 2, 2024
Last update	December 2, 2024
Author	Paul Hongsuck Seo
More Authors	Cordelia Schmid Minsu Cho Manjin Kim
Homepage	https://arxiv.org/abs/2103.15691