Long-term Leap Attention, Short-term Periodic Shift for Video Classification

Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes T times longer sequence than the latter under the current attention of quadratic complexity. The existing works treat the temporal axis as a simple extension of spatial axes, focusing on shortening the spatio-temporal sequence by either generic pooling or local windowing without utilizing temporal redundancy.

Data and Resources

Cite this as

Hao Zhang, Lechao Cheng, Yanbin Hao, Chong-wah Ngo (2025). Dataset: Long-term Leap Attention, Short-term Periodic Shift for Video Classification. https://doi.org/10.57702/oy3jtirs

DOI retrieved: January 3, 2025

Additional Info

Field Value
Created January 3, 2025
Last update January 3, 2025
Defined In https://doi.org/10.1145/3503161.3547908
Author Hao Zhang
More Authors
Lechao Cheng
Yanbin Hao
Chong-wah Ngo
Homepage https://github.com/VideoNetworks/LAPS-transformer