Long-term Leap Attention, Short-term Periodic Shift for Video Classification

doi:doi:10.57702/oy3jtirs

Long-term Leap Attention, Short-term Periodic Shift for Video Classification

Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes T times longer sequence than the latter under the current attention of quadratic complexity. The existing works treat the temporal axis as a simple extension of spatial axes, focusing on shortening the spatio-temporal sequence by either generic pooling or local windowing without utilizing temporal redundancy.

BibTex:

@dataset{Hao_Zhang_and_Lechao_Cheng_and_Yanbin_Hao_and_Chong-wah_Ngo_2025,
    abstract = {Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes T times longer sequence than the latter under the current attention of quadratic complexity. The existing works treat the temporal axis as a simple extension of spatial axes, focusing on shortening the spatio-temporal sequence by either generic pooling or local windowing without utilizing temporal redundancy.},
    author = {Hao Zhang and Lechao Cheng and Yanbin Hao and Chong-wah Ngo},
    doi = {10.57702/oy3jtirs},
    institution = {No Organization},
    keyword = {'spatio-temporal sequence', 'temporal attention', 'video classification'},
    month = {jan},
    publisher = {TIB},
    title = {Long-term Leap Attention, Short-term Periodic Shift for Video Classification},
    url = {https://service.tib.eu/ldmservice/dataset/long-term-leap-attention--short-term-periodic-shift-for-video-classification},
    year = {2025}
}